面向企业知识图谱构建的中文实体关系抽取

孙晨; 付英男; 程文亮; 钱卫宁

doi:10.3969/j.issn.1000-5641.2018.03.007

华东师范大学学报（自然科学版） >

2018 , Vol. 2018 >Issue 3: 55 - 66

DOI: https://doi.org/10.3969/j.issn.1000-5641.2018.03.007

计算机科学

面向企业知识图谱构建的中文实体关系抽取

孙晨 ,
付英男 ,
程文亮 ,
钱卫宁

展开

华东师范大学数据科学与工程学院, 上海 200062

孙晨,女,硕士研究生,研究方向为知识图谱.E-mail:2683122260@qq.com.

收稿日期: 2017-08-19

网络出版日期: 2018-05-29

基金资助

国家重点研发计划（2016YFB1000905）；国家自然科学基金广东省联合重点项目（U1401256）；国家自然科学基金（61672234，61402177）；华东师范大学信息化软科学研究课题（41600-10201-562940/018）.

收起

Chinese named entity relation extraction for enterprise knowledge graph construction

SUN Chen ,
FU Ying-nan ,
CHENG Wen-liang ,
QIAN Wei-ning

Expand

School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

Received date: 2017-08-19

Online published: 2018-05-29

Fold

摘要

企业知识图谱是针对金融领域为描述企业间商业往来关系而构建的一类垂直领域知识库.尽管垂直领域知识图谱在领域覆盖的广度上不如开放知识图谱，但是它对知识准确率的要求却远远高于开放知识图谱，因此虽然近些年开放知识图谱取得了很大的进展，但在垂直领域中却并未得到深入应用，尤其是商业领域，其对企业知识图谱提出了很大的需求.针对企业知识图谱目前在关系抽取效果上的局限性，在分析了实体关系抽取研究现状的基础上，提出了一种基于分类的中文实体关系抽取方法.该方法使用最大熵模型，通过对上市公司公报数据进行实验分析，从而寻找到该关系抽取的最优特征模板，并使在企业公报这一数据集上的准确率普遍达到85%以上.

关键词： 企业知识图谱; 实体关系抽取; 最大熵模型

本文引用格式

孙晨 , 付英男 , 程文亮 , 钱卫宁 . 面向企业知识图谱构建的中文实体关系抽取[J]. 华东师范大学学报（自然科学版）, 2018 , 2018(3) : 55 -66 . DOI: 10.3969/j.issn.1000-5641.2018.03.007

Abstract

The enterprise knowledge graph is a kind of domain knowledge base for the financial field to describe business relationships between enterprises. Although the domain knowledge graph is not broadly covered in the field, the precision of the knowledge is better than with an open knowledge graph. Despite the fact that open knowledge graphs have made significant advancements in recent years, vertical fields-especially business-have not seen in-depth applications in practice; this has resulted in significant demands on the enterprise knowledge graph. This paper proposes a Chinese entity relation extraction method based on classification for the limitation of extraction results. In this method, the maximum entropy model is used to analyze the data of selected companies' announcements to determine the optimal feature template. The results show that accuracy rates reach over 85% in the enterprise bulletin data set.

Key words： enterprise knowledge graph; named entity relation extraction; maximum entropy

参考文献

[1] PUJARA J, MIAO H, GETOOR L, et al. Knowledge graph identification[C]//International Semantic Web Conference. New York:Springer-Verlag, Inc, 2013:542-557.
[2] DESHPANDE O, LAMBA D S, TOURN M, et al. Building, maintaining, and using knowledge bases:A report from the trenches[C]//ACM SIGMOD International Conference on Management of Data. ACM, 2013:1209-1220.
[3] HEARST M A. Automatic acquisition of hyponyms from large text corpora[C]//Proceeding of the 14th Conference on Computational Linguistics. 1992:539-545.
[4] WU W T, LI H S, WANG H X, et al. Probase:A probabilistic taxonomy for text understanding[C]//Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 2012:481-492.
[5] ZHOU G D, SU J, ZHANG J, et al. Exploring various knowledge in relation extraction[C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 2005:427-434.
[6] ZHOU G D, ZHANG M, JI D H, et al. Tree kernel-based relation extraction with context-sensitive structured parse Tree information[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. DBLP, 2007:728-736.
[7] BRIN S. Extracting patterns and relations from the World Wide Web[C]//WebDB'98 Selected Papers from the International Workshop on the World Wide Web and Databases. Berlin:Springer, 1998:172-183.
[8] AGICHTEIN E, GRAVANO L. Snowball:Extracting relations from large plain-text collections[C]//ACM Conference on Digital Libraries. ACM, 2000:85-94.
[9] HASEGAWA T, SEKINE S, GRISHMAN R. Discovering relations among named entities from large corpora[C]//Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2004:415.
[10] 郭喜跃, 何婷婷, 胡小华, 等. 基于句法语义特征的中文实体关系抽取[J]. 中文信息学报, 2014, 28(6):183-189.
[11] KAMBHATLA N. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations[C]//Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, 2004:Article No 22.
[12] RATNAPARKHI A. Maximum entropy models for natural language ambiguity resolution[D]. Pennsylvania:University of Pennsylvania, 1998.
[13] 李丹. 基于朴素贝叶斯方法的中文文本分类研究[D]. 石家庄:河北大学, 2011.
[24] 薛俊欣. 条件随机场模型研究及应用[D]. 济南:山东大学, 2014.
[15] DARROCH J N, RATCLIFF D. Generalized iterative scaling for log-linear models[J]. Annals of Mathematical Statistics, 1972, 43(5):1470-1480.
[16] BERGER A. The improved iterative scaling algorithm:A gentle introduction[R/OL]. (1997-12-12)[2017-05-19]. http://www.doc88.com/p-1806889293798.html.
[17] 胡宝顺, 王大玲, 于戈, 等. 基于句法结构特征分析及分类技术的答案提取算法[J]. 计算机学报, 2008, 31(4):662-676.
[18] OLSON D L, DELEN D. Advanced Data Mining Techniques[M]. Berlin:Springer, 2008.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献