金融知识图谱

面向领域知识图谱的实体关系联合抽取

  • 付瑞 ,
  • 李剑宇 ,
  • 王笳辉 ,
  • 岳昆 ,
  • 胡矿
展开
  • 云南大学 信息学院,昆明 650500

收稿日期: 2021-08-05

  网络出版日期: 2021-09-28

基金资助

国家自然科学基金(U1802271); 云南省重大科技专项(202002AD080002-1-B); 云南省青年拔尖人才计划(C6193032); 云南省教育厅科研基金(2020J0004)

Joint extraction of entities and relations for domain knowledge graph

  • Rui FU ,
  • Jianyu LI ,
  • Jiahui WANG ,
  • Kun YUE ,
  • Kuang HU
Expand
  • School of Information Science and Engineering, Yunnan University, Kunming 650500, China

Received date: 2021-08-05

  Online published: 2021-09-28

摘要

文本数据中的实体和关系抽取是领域知识图谱构建和更新的来源. 针对金融科技领域中文本数据存在重叠关系、训练数据缺乏标注样本等问题, 提出一种融合主动学习思想的实体关系联合抽取方法. 首先, 基于主动学习, 以增量的方式筛选出富有信息量的样本作为训练数据; 其次, 采用面向主实体的标注策略将实体关系联合抽取问题转化为序列标注问题; 最后, 基于改进的BERT-BiGRU-CRF模型实现领域实体与关系的联合抽取, 为知识图谱构建提供支撑技术, 有助于金融从业者根据领域知识进行分析、投资、交易等操作, 从而降低投资风险. 针对金融领域文本数据进行实验测试, 实验结果表明, 本文所提出的方法有效, 验证了该方法后续可用于金融知识图谱的构建.

本文引用格式

付瑞 , 李剑宇 , 王笳辉 , 岳昆 , 胡矿 . 面向领域知识图谱的实体关系联合抽取[J]. 华东师范大学学报(自然科学版), 2021 , 2021(5) : 24 -36 . DOI: 10.3969/j.issn.1000-5641.2021.05.003

Abstract

Extraction of entities and relationships from text data is used to construct and update domain knowledge graphs. In this paper, we propose a method to jointly extract entities and relations by incorporating the concept of active learning; the proposed method addresses problems related to the overlap of vertical domain data and the lack of labeled samples in financial technology domain text data using the traditional approach. First, we select informative samples incrementally as training data sets. Next, we transform the exercise of joint extraction of entities and relations into a sequence labeling problem by labelling the main entities. Finally, we fulfill the joint extraction using the improved BERT-BiGRU-CRF model for construction of a knowledge graph, and thus facilitate financial analysis, investment, and transaction operations based on domain knowledge, thereby reducing investment risks. Experimental results with finance text data shows the effectiveness of our proposed method and verifies that the method can be successfully used to construct financial knowledge graphs.

参考文献

1 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述. 计算机研究与发展, 2016, 53 (3): 582- 600.
2 LI J, WANG Z, WANG Y, et al. Research on distributed search technology of multiple data sources intelligent information based on knowledge graph. Journal of Signal Processing Systems, 2021, 93 (2): 239- 248.
3 饶子昀, 张毅, 刘俊涛, 等.应用知识图谱的推荐方法与系统 [J/OL].自动化学报, 2020. (2020-07-09)[2021-08-05]. https://doi.org/10.16383/j.aas.c200128.
4 LU X, PRAMANIK S, ROY R., et al. Answering complex questions by joining multi-document evidence with quasi knowledge graphs [C]//Proceedings of the 42nd International ACM SIGIR Conference. NewYork: ACM, 2019: 105-114.
5 LEHMANN J, ISELE R., JAKOB M, et al. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 2015, 6 (2): 167- 195.
6 MAHDISOLTANI F, BIEGA J, SUCHANEK F. YAGO3: A knowledge base from multilingual Wikipedias [C/OL]//Proceedings of the 7th Biennial Conference on Innovative Data Systems Research. 2015. [2021-08-05]. https://suchanek.name/work/publications/cidr2015.pdf.
7 BOLLACKER K, COOK R, TUFTS P. Freebase: A shared database of structured general human knowledge [C]//Proceedings of the 22nd AAAI Conference on Artificial Intelligence. California: AAAI, 2007: 1962-1963.
8 ELHAMMADI S, LAKSHMANAN L, NG R, et al. A high precision pipeline for financial knowledge graph construction [C]//Proceedings of the 28th International Conference on Computational Linguistics. Berlin: Springer, 2020: 967-977.
9 YANG Y, WEI Z, CHEN Q, et al. Using external knowledge for financial event prediction based on graph neural networks [C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing: ACM, 2019: 2161-2164.
10 龙军, 殷建平, 祝恩, 等. 主动学习研究综述. 计算机研究与发展, 2008, (S1): 300- 304.
11 HOCHREITER S, SCHMIDHUBER J. Long short-term memory. Neural Computation, 1997, 9 (8): 1735- 1780.
12 CHO K, MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Computer Science, 2014, 1724- 1734.
13 ZENG D, LIU K, LAI S, et al. Relation classification via convolutional deep neural network [C]//Proceedings of the 25th International Conference on Computational Linguistics. Pennsylvania: ACL, 2014: 2335-2344.
14 XU Y, MOU L, GE L, et al. Classifying relations via long short term memory networks along shortest dependency paths [C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Pennsylvania: ACL, 2015: 1785-1794.
15 MIWA M, BANSAL M. End-to-end relation extraction using LSTMs on sequences and tree structures [C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Pennsylvania: ACL, 2016: 1105-1116.
16 ZHENG S, WANG F, BAO H, et al. Joint extraction of entities and relations based on a novel tagging scheme [C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Pennsylvania: ACL, 2017: 1227-1236.
17 ZENG X, ZENG D, HE S, et al. Extracting relational facts by an end-to-end neural model with copy mechanism [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Pennsylvania: ACL, 2018: 506-514.
18 HOULSBY N, HUSZáR F, GHAHRAMANI Z, et al. Bayesian active learning for classification and preference learning [EB/OL]. (2011-12-24) [2021-08-05]. https://arxiv.org/pdf/1112.5745.pdf.
19 TANG P, HUANG S. Self-paced active learning: Query the right thing at the right time [C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. California: AAAI, 2019: 5117-5124.
20 TRAN V, NGUYEN N, FUJITA H, et al. A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields. Knowledge-Based Systems, 2017, 132, 179- 187.
21 SHEN Y, YUN H, LIPTON Z, et al. Deep active learning for named entity recognition [EB/OL]. (2018-02-04)[2021-09-08]. https://arxiv.org/pdf/1707.05928.pdf.
22 JACOB D, CHANG M, LEE K, et al. BERT: Pretraining of deep bidirectional transformers for language understanding [C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. 2019: 4171-4186.
23 RIEDEL S, YAO L, MCCALLUM A K. Modeling relations and their mentions without labeled text [C]//Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin: Springer, 2010: 148-163.
24 郁可人, 傅云斌, 董启文. 基于神经网络语言模型的分布式词向量研究进展. 华东师范大学学报(自然科学版), 2017, (5): 52- 65.
文章导航

/