Financial Knowledge Graph

Joint extraction of entities and relations for domain knowledge graph

  • Rui FU ,
  • Jianyu LI ,
  • Jiahui WANG ,
  • Kun YUE ,
  • Kuang HU
Expand
  • School of Information Science and Engineering, Yunnan University, Kunming 650500, China

Received date: 2021-08-05

  Online published: 2021-09-28

Abstract

Extraction of entities and relationships from text data is used to construct and update domain knowledge graphs. In this paper, we propose a method to jointly extract entities and relations by incorporating the concept of active learning; the proposed method addresses problems related to the overlap of vertical domain data and the lack of labeled samples in financial technology domain text data using the traditional approach. First, we select informative samples incrementally as training data sets. Next, we transform the exercise of joint extraction of entities and relations into a sequence labeling problem by labelling the main entities. Finally, we fulfill the joint extraction using the improved BERT-BiGRU-CRF model for construction of a knowledge graph, and thus facilitate financial analysis, investment, and transaction operations based on domain knowledge, thereby reducing investment risks. Experimental results with finance text data shows the effectiveness of our proposed method and verifies that the method can be successfully used to construct financial knowledge graphs.

Cite this article

Rui FU , Jianyu LI , Jiahui WANG , Kun YUE , Kuang HU . Joint extraction of entities and relations for domain knowledge graph[J]. Journal of East China Normal University(Natural Science), 2021 , 2021(5) : 24 -36 . DOI: 10.3969/j.issn.1000-5641.2021.05.003

References

1 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述. 计算机研究与发展, 2016, 53 (3): 582- 600.
2 LI J, WANG Z, WANG Y, et al. Research on distributed search technology of multiple data sources intelligent information based on knowledge graph. Journal of Signal Processing Systems, 2021, 93 (2): 239- 248.
3 饶子昀, 张毅, 刘俊涛, 等.应用知识图谱的推荐方法与系统 [J/OL].自动化学报, 2020. (2020-07-09)[2021-08-05]. https://doi.org/10.16383/j.aas.c200128.
4 LU X, PRAMANIK S, ROY R., et al. Answering complex questions by joining multi-document evidence with quasi knowledge graphs [C]//Proceedings of the 42nd International ACM SIGIR Conference. NewYork: ACM, 2019: 105-114.
5 LEHMANN J, ISELE R., JAKOB M, et al. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 2015, 6 (2): 167- 195.
6 MAHDISOLTANI F, BIEGA J, SUCHANEK F. YAGO3: A knowledge base from multilingual Wikipedias [C/OL]//Proceedings of the 7th Biennial Conference on Innovative Data Systems Research. 2015. [2021-08-05]. https://suchanek.name/work/publications/cidr2015.pdf.
7 BOLLACKER K, COOK R, TUFTS P. Freebase: A shared database of structured general human knowledge [C]//Proceedings of the 22nd AAAI Conference on Artificial Intelligence. California: AAAI, 2007: 1962-1963.
8 ELHAMMADI S, LAKSHMANAN L, NG R, et al. A high precision pipeline for financial knowledge graph construction [C]//Proceedings of the 28th International Conference on Computational Linguistics. Berlin: Springer, 2020: 967-977.
9 YANG Y, WEI Z, CHEN Q, et al. Using external knowledge for financial event prediction based on graph neural networks [C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing: ACM, 2019: 2161-2164.
10 龙军, 殷建平, 祝恩, 等. 主动学习研究综述. 计算机研究与发展, 2008, (S1): 300- 304.
11 HOCHREITER S, SCHMIDHUBER J. Long short-term memory. Neural Computation, 1997, 9 (8): 1735- 1780.
12 CHO K, MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Computer Science, 2014, 1724- 1734.
13 ZENG D, LIU K, LAI S, et al. Relation classification via convolutional deep neural network [C]//Proceedings of the 25th International Conference on Computational Linguistics. Pennsylvania: ACL, 2014: 2335-2344.
14 XU Y, MOU L, GE L, et al. Classifying relations via long short term memory networks along shortest dependency paths [C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Pennsylvania: ACL, 2015: 1785-1794.
15 MIWA M, BANSAL M. End-to-end relation extraction using LSTMs on sequences and tree structures [C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Pennsylvania: ACL, 2016: 1105-1116.
16 ZHENG S, WANG F, BAO H, et al. Joint extraction of entities and relations based on a novel tagging scheme [C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Pennsylvania: ACL, 2017: 1227-1236.
17 ZENG X, ZENG D, HE S, et al. Extracting relational facts by an end-to-end neural model with copy mechanism [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Pennsylvania: ACL, 2018: 506-514.
18 HOULSBY N, HUSZáR F, GHAHRAMANI Z, et al. Bayesian active learning for classification and preference learning [EB/OL]. (2011-12-24) [2021-08-05]. https://arxiv.org/pdf/1112.5745.pdf.
19 TANG P, HUANG S. Self-paced active learning: Query the right thing at the right time [C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. California: AAAI, 2019: 5117-5124.
20 TRAN V, NGUYEN N, FUJITA H, et al. A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields. Knowledge-Based Systems, 2017, 132, 179- 187.
21 SHEN Y, YUN H, LIPTON Z, et al. Deep active learning for named entity recognition [EB/OL]. (2018-02-04)[2021-09-08]. https://arxiv.org/pdf/1707.05928.pdf.
22 JACOB D, CHANG M, LEE K, et al. BERT: Pretraining of deep bidirectional transformers for language understanding [C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. 2019: 4171-4186.
23 RIEDEL S, YAO L, MCCALLUM A K. Modeling relations and their mentions without labeled text [C]//Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin: Springer, 2010: 148-163.
24 郁可人, 傅云斌, 董启文. 基于神经网络语言模型的分布式词向量研究进展. 华东师范大学学报(自然科学版), 2017, (5): 52- 65.
Outlines

/