Review Articles

Topic model for graph mining based on hierarchical Dirichlet process

Haibin Zhang ,

East China Normal University, Shanghai, People's Republic of China

Shang Huating ,

East China Normal University, Shanghai, People's Republic of China

Xianyi Wu

East China Normal University, Shanghai, People's Republic of China

xywu@stat.ecnu.edu.cn

Pages 66-77 | Received 21 Oct. 2018, Accepted 07 Mar. 2019, Published online: 27 Mar. 2019,
  • Abstract
  • Full Article
  • References
  • Citations

Abstract

In this paper, a nonparametric Bayesian graph topic model (GTM) based on hierarchical Dirichlet process (HDP) is proposed. The HDP makes the number of topics selected flexibly, which breaks the limitation that the number of topics need to be given in advance. Moreover, the GTM releases the assumption of ‘bag of words’ and considers the graph structure of the text. The combination of HDP and GTM takes advantage of both which is named as HDP–GTM. The variational inference algorithm is used for the posterior inference and the convergence of the algorithm is analysed. We apply the proposed model in text categorisation, comparing to three related topic models, latent Dirichlet allocation (LDA), GTM and HDP.

References

  1. Aldous, D. J. (1985). Exchangeability and related topics. Berlin: Springer. [Crossref], [Google Scholar]
  2. Blackwell, D., & MacQueen, J. (1973). Ferguson distributions via Polya urn schemes. The Annals of Statistics1, 353–355. doi: 10.1214/aos/1176342372 [Crossref][Web of Science ®], [Google Scholar]
  3. Blei, David M. (2012). Probabilistic topic models. Communications of the ACM55(4), 77–84. doi: 10.1145/2133806.2133826 [Crossref][Web of Science ®], [Google Scholar]
  4. Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association112(518), 859–877. doi:10.1080/01621459.2017.1285773 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  5. Blei, D. M., Ng, A. Y., Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research3, 993–1022. [Crossref][Web of Science ®], [Google Scholar]
  6. Deerwester, S. (1990). Indexing by latent semantic analysis. Journal of the Association for Information Science and Technology41(6), 391–407. [Crossref][Web of Science ®], [Google Scholar]
  7. Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics1(2), 209–230. doi: 10.1214/aos/1176342360 [Crossref][Web of Science ®], [Google Scholar]
  8. Gershman, S. J., & Blei, D. M. (2012). A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology56(1), 1–12. doi: 10.1016/j.jmp.2011.08.004 [Crossref][Web of Science ®], [Google Scholar]
  9. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America101(Suppl 1), 5228–5235. doi: 10.1073/pnas.0307752101 [Crossref][Web of Science ®], [Google Scholar]
  10. Griffths, T. L., Steyvers, M., Blei, D. M., Tenenbaum, J. B. (2005). Integrating topics and syntax. International Conference on Neural Information Processing Systems. [Google Scholar]
  11. Gupta, V., & Lehal, G. S. (2009). A survey of text mining techniques and applications. Journal of Emerging Technologies in Web Intelligence1(1), 60–76. doi: 10.4304/jetwi.1.1.60-76 [Crossref], [Google Scholar]
  12. Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, Stockholm, Sweden. [Google Scholar]
  13. Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine learning37(2), 183–233. doi: 10.1023/A:1007665907178 [Crossref][Web of Science ®], [Google Scholar]
  14. Miner, G., Elder, J., Fast, A., Hill, T., Nisbet, R., Delen, D. (2012). Practical text mining and statistical analysis for non-structured text data applications. Waltham: Academic Press. [Google Scholar]
  15. Pitman, J. (2002). Poisson–Dirichlet and GEM invariant distributions for split-and-merge transformations of an interval partition. Combinatorics, Probability and Computing11(5), 501–514. doi: 10.1017/S0963548302005163 [Crossref][Web of Science ®], [Google Scholar]
  16. Sato, I, Nakagawa, H. (2010). Topic models with power-law using Pitman-Yor process. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, Washington. [Crossref], [Google Scholar]
  17. Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica4(2), 639–650. [Web of Science ®], [Google Scholar]
  18. Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association101(476), 1566–1581. doi: 10.1198/016214506000000302 [Taylor & Francis Online][Web of Science ®], [Google Scholar]
  19. Valle, K. (2011). Graph-based representations for textual case-based reasoning (Master thesis). Norwegian University of Science and Technology. [Google Scholar]
  20. Wainwright, M. J, Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends (Vol. 1, pp. 1–305). [Google Scholar]
  21. Wang, C., Paisley, J. W., & Blei, D. M. (2011). Online variational inference for the hierarchical Dirichlet process. Journal of Machine Learning Research15, 752–760. [Google Scholar]
  22. Wang, B., & Titterington, D. M. (2006). Convergence properties of a general algorithm for calculating variational Bayesian estimates for a normal mixture model. Bayesian Analysis1(1), 625–650. doi: 10.1214/06-BA121 [Crossref], [Google Scholar]
  23. Xuan, J., Lu, J., Zhang, G., Luo, X. (2015). Topic model for graph mining. IEEE Transactions on Cybernetics45(12), 2792–2803. doi: 10.1109/TCYB.2014.2386282 [Crossref][Web of Science ®], [Google Scholar]

Articles from other publishers

Carlos Roque, João Lourenço Cardoso, Thomas Connell, Govert Schermers, Roland Weber. (2019) Topic analysis of Road safety inspections using latent dirichlet allocation: A case study of roadside safety in Irish main roads. Accident Analysis & Prevention 131, pages 336-349. 
Crossref