华东师范大学学报(自然科学版) ›› 2018, Vol. 2018 ›› Issue (1): 91-102,145.doi: 10.3969/j.issn.1000-5641.2018.01.009

• 计算机科学 • 上一篇    下一篇

基于前向分步算法的文档实体排序

王燕华   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2016-12-01 出版日期:2018-01-25 发布日期:2018-01-11
  • 作者简介:王燕华,男,硕士研究生,研究方向为机器学习.E-mail:yhwang917@gmail.com.
  • 基金资助:
    上海市科技兴农推广项目(2015第3-2号)

Forward stagewise additive modeling for entity ranking in documents

WANG Yan-hua   

  1. School of Data Science and Engineering, East China Normal University, EDWEI Shanghai 200062, China
  • Received:2016-12-01 Online:2018-01-25 Published:2018-01-11

摘要: 文档中的关键实体可以抽象概括文本所描述的事件(或话题)的主体,推动面向实体的检索和问答系统等方面的研究.然而,文档中的实体是无序的,对文本中的实体进行排序显得尤为重要.提取文本实体特征并借助维基百科和词汇分布表示引入外部特征,提出了一种基于前向分步算法(Forward Stagewise Algorithm,FSAM)的排序模型LA-FSAM (FSAM based on AUC Metric and LogisticFunction).该模型利用曲线下面积(Area Under the Curve,AUC)准则构造损失函数,逻辑斯谛函数整合实体特征,最后使用随机梯度下降法求解模型参数.通过LA-FSAM与基线方法的实验对比证明了所提方法的有效性.

关键词: 实体排序, 前向分步算法, 曲线下面积, 逻辑斯谛函数, 随机梯度下降

Abstract: Key entities of a document can help to summarize the subjects of the events or the topics that the document describes, which can contribute to applications such as entity-oriented information retrieval and question-answering. However, entities in free text are unordered and hence it is important to rank entities of a document. In this paper, firstly, we make full use of features of entities that extracted from the document and draw support from Wikipedia and Word Embedding to generate external features. Then, we propose a novel ranking model named LA-FSAM(FSAM based on AUC Metric and Logistic Function) which is based on forward stagewise algorithm additive modeling. In LA-FSAM, we employ the AUC(Area Under the Curve) metric to construct the loss function and the logistic function to integrate features of entities. Finally, the stochastic gradient descent is utilized to optimize parameters of LA-FSAM model. After experiments, our evaluation shows the efficiency of the model we proposed.

Key words: entity ranking, forward stagewise additive modeling, area under the curve, logistic function, stochastic gradient descent

中图分类号: