华东师范大学学报(自然科学版) ›› 2007, Vol. 2007 ›› Issue (5): 107-112.

• 计算机科学 • 上一篇    下一篇

基于领域本体的文档自动摘要算法

王 麒, 江开忠, 杨 静, 顾君忠   

  1. 华东师范大学 计算机应用研究所, 上海 200062
  • 收稿日期:2006-12-15 修回日期:2007-03-23 出版日期:2007-09-25 发布日期:2007-09-25
  • 通讯作者: 顾君忠

Domain OntologyBased Document Automatic Summarization(Chinese)

WANG Qi , JIANG Kai-zhong, YANG Jing, GU Jun-zhong
  

  1. Institute of Computer Applications,East China Normal University, Shanghai 200062, China
  • Received:2006-12-15 Revised:2007-03-23 Online:2007-09-25 Published:2007-09-25
  • Contact: GU Jun-zhong

摘要: 介绍了一种以潜语义分析模型为基础,辅之以领域本体的文档自动摘要算法.该方法在传统的基于统计的奇异值分解算法基础上,通过领域本体引入了文档主题识别以及概念相似度计算,更好地用形式化的方式描述了文档的主要内容;在文档主题和概念相似度的指导下,使用统计方法和启发式规则抽取文档中的关键句子作为摘要,并通过实验证明提高了摘要的质量.

关键词: 本体, 自动摘要, 奇异值分解, 本体, 自动摘要, 奇异值分解

Abstract: A new arithmetic based on Latent Semantic Analysis Model and domain ontology was proposed to summarize the document. Based on the traditional statistic arithmetic,recognition of document theme and computation of concept similarity were imposed by using domain ontology,which described the main content of documents better. In the guide of document theme and concept similarity, statistical approaches and heuristic rules to extract keysentences were used, which are proved to improve the quality of automatic summarization arithmetic by experiment.

Key words: automatic summarization, singular value decomposition, ontology, automatic summarization, singular value decomposition

中图分类号: