Forward stagewise additive modeling for entity ranking in documents

  • WANG Yan-hua
Expand
  • School of Data Science and Engineering, East China Normal University, EDWEI Shanghai 200062, China

Received date: 2016-12-01

  Online published: 2018-01-11

Abstract

Key entities of a document can help to summarize the subjects of the events or the topics that the document describes, which can contribute to applications such as entity-oriented information retrieval and question-answering. However, entities in free text are unordered and hence it is important to rank entities of a document. In this paper, firstly, we make full use of features of entities that extracted from the document and draw support from Wikipedia and Word Embedding to generate external features. Then, we propose a novel ranking model named LA-FSAM(FSAM based on AUC Metric and Logistic Function) which is based on forward stagewise algorithm additive modeling. In LA-FSAM, we employ the AUC(Area Under the Curve) metric to construct the loss function and the logistic function to integrate features of entities. Finally, the stochastic gradient descent is utilized to optimize parameters of LA-FSAM model. After experiments, our evaluation shows the efficiency of the model we proposed.

Cite this article

WANG Yan-hua . Forward stagewise additive modeling for entity ranking in documents[J]. Journal of East China Normal University(Natural Science), 2018 , 2018(1) : 91 -102,145 . DOI: 10.3969/j.issn.1000-5641.2018.01.009

References

[1] FiNKEL J R, GRENAGER T, MANNING C. Incorporating non-local information into information extraction systems by gibbs sampling[C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2005:363-370.
[2] ZHANG W, FENG W, WANG J Y. Integrating semantic relatedness and words' intrinsic features for keyword extraction[C]//Proceedings of the 23rd International Join Conference on Artificial Intelligence. 2013:2225-2231.
[3] HOFMANN K, TSAGKIAS M, MEIJ E, et al. The impact of document structure on keyphrase extraction[C]//Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 2009:1725-1728.
[4] LI Z H, ZHOU D, JUAN Y F, et al. Keyword extraction for social snippets[C]//Proceedings of the 19th International Conference on World Wide Web. ACM, 2010:1143-1144.
[5] JIANG X, HU Y H, LI H. A ranking approach to keyphrase extraction[C]//Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2009:756-757.
[6] ZHANG F, HUANG L E, PENG B. WordTopic-MultiRank:A new method for automatic keyphrase extraction[C]//Proceedings of the 6th International Joint Conference on Natural Language. ACL, 2013:10-18.
[7] LIU Z Y, HUANG W Y, ZHENG Y B, et al. Automatic keyphrase extraction via topic decomposition[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010:366-376.
[8] MIHALCEA R, TARAU P. TextRank:Bringing order into texts[C]//Conference on Empirical Methods in Natural Language Processing. ACL, 2004:404-411.
[9] WANG J H, LIU J Y, WANG C. Keyword extraction based on pagerank[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin:Springer, 2007:857-864.
[10] WANG R, LIU W, MCDONALD C. Using word embeddings to enhance keyword identification for scientific publications[C]//Australasian Database Conference. Berlin:Springer International Publishing, 2015:257-268.
[11] LIU Z Y, LI P, ZHENG Y B, et al. Clustering to find exemplar terms for keyphrase extraction[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing:Volume 1-Volume 1. Association for Computational Linguistics, 2009:257-266.
[12] DEMARTINI G, MISSEN M M S, BLANCO R, et al. Entity summarization of news articles[C]//Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2010:795-796.
[13] BASHIR S, AFZAL W, BAIG A R. Opinion-based entity ranking using learning to rank[J]. Applied Soft Computing, 2016, 38:151-163.
[14] SCHUHMACHER M, DIETZ L, PONZETTO S P. Ranking entities for Web queries through text and knowledge[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 2015:1461-1470.
[15] HASTIE T, FRIEDMAN J,TIBSHIRANI R. The Elements of Statistical Learning[M]//Springer Series in Statistics. New York:Springer-Verlag,2001:342-343.
[16] KANG C S, YIN D W, ZHANG R Q, et al. Learning to rank related entities in Web search[J]. Neurocomputing, 2015, 166:309-318.
[17] KANG C S, VADREVU S, ZHANG R Q, et al. Ranking related entities for Web search queries[C]//Proceedings of the 20th International Conference Companion on World Wide Web. ACM, 2011:67-68.
[18] GRAUS D, TSAGKIAS M, WEERKAMP W, et al. Dynamic collective entity representations for entity ranking[C]//Proceedings of the 9th ACM International Conference on Web Search and Data Mining. ACM, 2016:595-604.
[19] LI H. Learning to Rank for Information Retrieval and Natural Language Processing[C/OL]//Synthesis Lectures on Human Language Technologies #26. 2nd ed.[S.l]:Morgan and Claypool Publishers, 2014[2016-07-01]. http://www.morganclaypool.com/doi/suppl/10.2200/S00607ED2V01Y201410HLT026/suppl_file/li_Ch1.pdf.
[20] JIJKOUN V, KHALID M A, MARX M, et al. Named entity normalization in user generated content[C]//Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data. ACM, 2008:23-30
[21] 李航. 统计学习方法[M]. 北京:清华大学出版社, 2012:137-145.
[22] BRODER A, KUMAR R, MAGHOUL F, et al. Graph structure in the Web[J]. Computer Networks, 2000, 33(1):309-320.
[23] FENG W, WANG J Y. Incorporating heterogeneous information for personalized tag recommendation in social tagging systems[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2012:1276-1284.
[24] TRAN G, ALRIFAI M, HERDER E. Timeline summarization from relevant headlines[C]//European Conference on Information Retrieval. Springer International Publishing, 2015:245-256.
[25] JOACHIMS T. Training linear SVMs in linear time[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2006:217-226.
Outlines

/