华东师范大学学报(自然科学版) ›› 2013, Vol. 2013 ›› Issue (6): 93-101.

• 计算机科学 • 上一篇    下一篇

一种面向微博主题挖掘的改进LDA模型

谢 昊, 江 红   

  1. 华东师范大学 计算中心, 上海 200062
  • 收稿日期:2012-11-01 修回日期:2013-02-01 出版日期:2013-11-25 发布日期:2014-01-13

Improved LDA model for microblog topic mining

XIE Hao, JIANG Hong   

  1. Computer Center, East China Normal University, Shanghai 200062, China
  • Received:2012-11-01 Revised:2013-02-01 Online:2013-11-25 Published:2014-01-13

摘要: 随着新浪微博用户的不断增长,微博网站成为很多人获取信息的平台.但是微博是一种特殊的文本,其字数受到严格限制,传统的主题模型并不能很好地分析微博的内容.本文提出了一个基于LDA的微博生成模型RT-LDA来解决微博字数受限的问题.模型采用吉布斯抽样法来推导,不仅能准确地挖掘每条微博的主题,还能归纳出用户关注的主题分布情况.在真实数据集上的实验表明,RT-LDA模型能很好地对微博进行主题挖掘.

关键词: 新浪微博, 文本挖掘, RT-LDA, 吉布斯抽样

Abstract: With the dramatic increase of Sina microblog users, microblog websites have been the platformsfor a wide spectrum of users to get information. Due to the fact that microblog is a special kind of text with the restricted length, traditional topic models could not be used to analyze the microblog content very well. RT-LDA, a microblog generation model based on LDA is proposed in this paper. Gibbs sampling is chosen to deduce the model, which can not only mine the topics of each microblog accurately but also induce the distribution of the concerned topics. RT-LDA’s effective utility on topic mining of the microblogs is verified by the experiments on real data.

Key words: Sina microblog, text mining, RT-LDA, Gibbs sampling

中图分类号: