华东师范大学学报(自然科学版) ›› 2013, Vol. 2013 ›› Issue (3): 118-130.

• 计算广告技术 • 上一篇    下一篇

LDA\,算法在Mahout下的高效实现

许伯熹1, 胡 宁2, 陈文斌1, 高卫国1, 程 晋1   

  1. 1. 复旦大学~~数学科学学院, 上海 200433;  2. 上海聚胜万合广告有限公司, 上海 200070
  • 收稿日期:2013-03-01 修回日期:2013-04-01 出版日期:2013-05-25 发布日期:2013-07-10

Efficient implementation for LDA in Mahout

XU Bo-xi1, HU Ning2, CHEN Wen-bin1, GAO Wei-guo1, CHENG Jin1   

  1. 1. School of Mathematical Sciences, Fudan University, Shanghai 200433, China; 2. Shanghai MediaV Advertising Corporation, Ltd., Shanghai} 200070, China
  • Received:2013-03-01 Revised:2013-04-01 Online:2013-05-25 Published:2013-07-10

摘要: 通过对运用\,Gibbs\,采样的\,Latent Dirichlet Allocation
(LDA)\,算法和\,MapReduce\,计算框架的细致研究,
实现了\,LDA\,算法在\,Mahout 下的分布式并行计算.
详细地考察了该分布式并行计算程序的计算性能,
并深入地探讨了一些影响计算性能的关键问题.

关键词: Latent Dirichlet Allocation, Gibbs\,采样, Mahout, 分布式并行计算, MapReduce\,计算框架

Abstract: In a careful study of Latent Dirichlet Allocation (LDA)
using Gibbs sampling and the MapReduce framework, an efficient
implementation for LDA in Mahout was achieved. The experiments
showed the high performance of this distributed parallel LDA
program, and several issues about enhancing performance were
discussed.

Key words: Latent Dirichlet Allocation, Gibbs sampling, Mahout, distributed parallel computing, MapReduce framework

中图分类号: