数据语义抽取

基于远程监督的关系抽取技术

  • 王嘉宁 ,
  • 何怡 ,
  • 朱仁煜 ,
  • 刘婷婷 ,
  • 高明
展开
  • 1. 华东师范大学 数据科学与工程学院, 上海 200062;
    2. 上海市大数据中心, 上海 200072

收稿日期: 2020-08-07

  网络出版日期: 2020-09-24

基金资助

国家重点研发计划(2016YFB1000905); 国家自然科学基金(U1911203, U1811264, 61877018, 61672234, 61672384); 中央高校基本科研业务费专项资金; 上海市科技兴农推广项目(T20170303); 上海市核心数学与实践重点实验室资助项目(18dz2271000)

Relation extraction via distant supervision technology

  • WANG Jianing ,
  • HE Yi ,
  • ZHU Renyu ,
  • LIU Tingting ,
  • GAO Ming
Expand
  • 1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China;
    2. Shanghai Municipal Big Data Center, Shanghai 200072, China

Received date: 2020-08-07

  Online published: 2020-09-24

摘要

关系抽取作为一种经典的自然语言处理任务, 广泛应用于知识图谱的构建与补全、知识库问答和文本摘要等领域, 旨在抽取目标实体对之间的语义关系. 为了能够高效地构建大规模监督语料, 基于远程监督的关系抽取方法被提出, 通过将文本与现有知识库进行对齐来实现自动标注. 然而由于过强的假设使得其面临诸多挑战, 从而吸引了研究者们的关注. 本文首先介绍远程监督关系抽取的概念和形式化描述, 其次从噪声、信息匮乏以及非均衡3个方面对比分析相关方法及其优缺点, 接着对评估数据集以及评测指标进行了解释和对比分析, 最后探讨了远程监督关系抽取面对的新的挑战以及未来发展趋势, 并在最后做出总结.

本文引用格式

王嘉宁 , 何怡 , 朱仁煜 , 刘婷婷 , 高明 . 基于远程监督的关系抽取技术[J]. 华东师范大学学报(自然科学版), 2020 , 2020(5) : 113 -130 . DOI: 10.3969/j.issn.1000-5641.202091006

Abstract

Relation extraction is one of the classic natural language processing tasks that has been widely used in knowledge graph construction and completion, knowledge base question answering, and text summarization. It aims to extract the semantic relation from a target entity pair. In order to construct a large-scale supervised corpus efficiently, a distant supervision method was proposed to realize automatic annotation by aligning the text with the existing knowledge base. However, it highlights a series of challenges as a result of over-strong assumptions and, accordingly, has attracted the attention of researchers. Firstly, this paper introduces the theories of distant supervision relation extraction and the corresponding formal descriptions. Secondly, we systematically analyze related methods and their respective pros and cons from three perspectives: noisy data, insufficient information, and data imbalance. Next, we explain and compare some benchmark corpus and evaluation metrics. Lastly, we highlight new subsequent challenges for distant supervision relation extraction and discuss trends and directions of future research before concluding.

参考文献

[1] 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述 [J]. 计算机研究与发展, 2016, 53(3): 582-600
[2] KEJRIWAL M, SEQUEDA J, LOPEZ V, et al. Knowledge graphs: Construction, management and querying: Editorial [J]. Social Work, 2019, 10(6): 961-962.
[3] YU M, YIN W, HASAN K S, et al. Improved neural relation detection for knowledge base question answering [C]// Meeting of the Association for Computational Linguistics. 2017: 571-581.
[4] ALLAHYARI M, POURIYEH S, ASSEFI M, et al. Text summarization techniques: A brief survey [J]. International Journal of Advanced Computer Science and Applications, 2017, 8(10): 397-405.
[5] HASEGAWA T, SEKINE S, GRISHMAN R, et al. Discovering relations among named entities from large corpora [C]// Meeting of the Association for Computational Linguistics. 2004: 415-422.
[6] ETZIONI O, BANKO M, SODERLAND S, et al. Open information extraction from the web [J]. Communications of the ACM, 2008, 51(12): 68-74.
[7] LI F, ZHANG M, FU G, et al. A Bi-LSTM-RNN model for relation classification using low-cost sequence features[J]. ArXiv: Computation and Language, 2016.
[8] 姚春华, 刘潇, 高弘毅, 等. 基于句法语义特征的实体关系抽取技术 [J]. 通信技术, 2018, 51(8): 1828-1835
[9] KUMLIEN M C J. Constructing biological knowledge bases by extraction information from text sources [C]// Proc Int Conf Intell Syst Mol Biol. 1999: 77-86.
[10] MINTZ M, BILLS S, SNOW R, et al. Distant supervision for relation extraction without labeled data [C]// International Joint Conference on Natural Language Processing. 2009: 1003-1011.
[11] ZENG X, HE S, LIU K, et al. Large scaled relation extraction with reinforcement learning [C]// National Conference on Artificial Intelligence. 2018: 5658-5665.
[12] 杨东明, 杨大为, 顾航, 等. 面向初等数学的知识点关系提取研究 [J]. 华东师范大学学报(自然科学版), 2019(5): 53-65
[13] RIEDEL S, YAO L, MCCALLUM A, et al. Modeling relations and their mentions without labeled text [C]// European Conference on Machine Learning. 2010: 148-163.
[14] BOLLACKER K, EVANS C, PARITOSH P, et al. Freebase: A collaboratively created graph database for structuring human knowledge [C]// International Conference on Management of Data. 2008: 1247-1250.
[15] JAT S, KHANDELWAL S, TALUKDAR P P, et al. Improving distantly supervised relation extraction using word and entity based attention [J]. ArXiv: Computation and Language, 2018.
[16] HAN X, ZHU H, YU P, et al. FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation [C]// Empirical Methods in Natural Language Processing. 2018: 4803-4809.
[17] ZENG D, LIU K, CHEN Y, et al. Distant supervision for relation extraction via piecewise convolutional neural networks [C]// Empirical Methods in Natural Language Processing. 2015: 1753-1762.
[18] ZELENKO D, AONE C, RICHARDELLA A, et al. Kernel methods for relation extraction [J]. Journal of Machine Learning Research, 2003, 3(6): 1083-1106.
[19] SHI G, FENG C, HUANG L, et al. Genre separation network with adversarial training for cross-genre relation extraction [C]// Empirical Methods in Natural Language Processing. 2018: 1018-1023.
[20] VASHISHTH S, JOSHI R, PRAYAGA S S, et al. RESIDE: Improving distantly-supervised neural relation extraction using side information [C]// Empirical Methods in Natural Language Processing. 2018: 1257-1266.
[21] LI Y, LONG G, SHEN T, et al. Self-attention enhanced selective gate with entity-aware embedding for distantly supervised relation extraction [C]// National Conference on Artificial Intelligence. 2020.
[22] KUANG J, CAO Y, ZHENG J, et al. Improving neural relation extraction with implicit mutual relations [C]// International Conference on Data Engineering. 2020.
[23] KRAUSE S, LI H, USZKOREIT H, et al. Large-scale learning of relation-extraction rules with distant supervision from the web [C]// International Semantic Web Conference. 2012: 263-278.
[24] 白龙, 靳小龙, 席鹏弼, 等. 基于远程监督的关系抽取研究综述 [J]. 中文信息学报, 2019, 33(10): 10-17
[25] 鄂海红, 张文静, 肖思琪, 等. 深度学习实体关系抽取研究综述 [J]. 软件学报, 2019, 30(6): 1793-1818
[26] SUCHANEK F M, KASNECI G, WEIKUM G, et al. Yago: A core of semantic knowledge [C]// The Web Conference. 2007: 697-706.
[27] ZHOU P, SHI W, TIAN J, et al. Attention-based bidirectional long short-term memory networks for relation classification [C]// Meeting of the Association for Computational Linguistics. 2016: 207-212.
[28] HOFFMANN R, ZHANG C, LING X, et al. Knowledge-based weak supervision for information extraction of overlapping relations [C]// Meeting of the Association for Computational Linguistics. 2011: 541-550.
[29] SURDEANU M, TIBSHIRANI J, NALLAPATI R, et al. Multi-instance multi-label learning for relation extraction [C]// Empirical Methods in Natural Language Processing. 2012: 455-465.
[30] TAKAMATSU S, SATO I, NAKAGAWA H, et al. Reducing wrong labels in distant supervision for relation extraction [C]// Meeting of the Association for Computational Linguistics. 2012: 721-729.
[31] FAN M, ZHAO D, ZHOU Q, et al. Distant supervision for relation extraction with matrix completion [C]// Meeting of the Association for Computational Linguistics. 2014: 839-849.
[32] ZHANG Q, WANG H. Noise-clustered distant supervision for relation extraction: A nonparametric bayesian perspective [C]// Empirical Methods in Natural Language Processing. 2017: 1808-1813.
[33] MIN B, GRISHMAN R, WAN L, et al. Distant supervision for relation extraction with an incomplete knowledge base [C]// North American Chapter of the Association for Computational Linguistics. 2013: 777-782.
[34] XU W, HOFFMANN R, ZHAO L, et al. Filling knowledge base gaps for distant supervision of relation extraction [C]// Meeting of the Association for Computational Linguistics. 2013: 665-670.
[35] RITTER A, ZETTLEMOYER L, ETZIONI O, et al. Modeling missing data in distant supervision for information extraction [C]// Transactions of the Association for Computational Linguistics. 2013: 367-378.
[36] LIN Y, SHEN S, LIU Z, et al. Neural relation extraction with selective attention over instances [C]// Meeting of the Association for Computational Linguistics. 2016: 2124-2133.
[37] JI G, LIU K, HE S, et al. Distant supervision for relation extraction with sentence-level attention and entity descriptions [C]// National Conference on Artificial Intelligence. 2017: 3060-3066.
[38] JAT S, KHANDELWAL S, TALUKDAR P P, et al. Improving distantly supervised relation extraction using word and entity based attention [J]. ArXiv: Computation and Language, 2018.
[39] WU S, FAN K, ZHANG Q, et al. Improving distantly supervised relation extraction with neural noise converter and conditional optimal selector [J]. National Conference on Artificial Intelligence, 2019, 33(1): 7273-7280.
[40] YE Z, LING Z. Distant supervision relation extraction with intra-bag and inter-bag attentions [C]// North American Chapter of the Association for Computational Linguistics. 2019: 2810-2819.
[41] YUAN Y, LIU L, TANG S, et al. Cross-relation cross-bag attention for distantly-supervised relation extraction [J]. National Conference on Artificial Intelligence, 2019, 33(1): 419-426.
[42] JIA W, DAI D, XIAO X, et al. ARNOR: Attention regularization based noise reduction for distant supervision relation classification [C]// Meeting of the Association for Computational Linguistics. 2019: 1399-1408.
[43] ALT C, HUBNER M, HENNIG L, et al. Fine-tuning pre-trained transformer language models to distantly supervised relation extraction [C]// Meeting of the Association for Computational Linguistics. 2019: 1388-1398.
[44] WU Y, BAMMAN D, RUSSELL S, et al. Adversarial training for relation extraction [C]// Empirical Methods in Natural Language Processing. 2017: 1778-1783.
[45] QIN P, WEIRAN X U, WANG W Y, et al. DSGAN: Generative adversarial training for robust distant supervision relation extraction [C]// Meeting of the Association for Computational Linguistics. 2018: 496-505.
[46] LI P, ZHANG X, JIA W, et al. GAN driven semi-distant supervision for relation extraction [C]// North American Chapter of the Association for Computational Linguistics. 2019: 3026-3035.
[47] HAN X, LIU Z, SUN M, et al. Denoising distant supervision for relation extraction via instance-level adversarial training [J]. ArXiv: Computation and Language, 2018.
[48] FENG J, HUANG M, ZHAO L, et al. Reinforcement learning for relation classification from noisy data [C]// National Conference on Artificial Intelligence. 2018: 5779-5786.
[49] HE Z, CHEN W, WANG Y, et al. Improving neural relation extraction with positive and unlabeled learning [C]// National Conference on Artificial Intelligence. 2020.
[50] QIN P, XU W, WANG W Y, et al. Robust distant supervision relation extraction via deep reinforcement learning [C]// Meeting of the Association for Computational Linguistics. 2018: 2137-2147.
[51] SU Y, LIU H, YAVUZ S, et al. Global relation embedding for relation extraction [C]// North American Chapter of the Association for Computational Linguistics. 2018: 820-830.
[52] XU P, BARBOSA D. Investigations on knowledge base embedding for relation prediction and extraction [J]. ArXiv: Computation and Language, 2018.
[53] XU P, BARBOSA D. Connecting language and knowledge with heterogeneous representations for neural relation extraction [C]// North American Chapter of the Association for Computational Linguistics. 2019: 3201-3206.
[54] LIU Y, LIU K, XU L, et al. Exploring fine-grained entity type constraints for distantly supervised relation extraction [C]// International Conference on Computational Linguistics. 2014: 2107-2116.
[55] YE Y, FENG Y, LUO B, et al. Integrating relation constraints with neural relation extractors [C]// National Conference on Artificial Intelligence. 2020.
[56] BELTAGY I, LO K, AMMAR W, et al. Combining distant and direct supervision for neural relation extraction [C]// North American Chapter of the Association for Computational Linguistics. 2019: 1858-1867.
[57] WEI Z, SU J, WANG Y, et al. A novel hierarchical binary tagging framework for joint extraction of entities and relations [J]. ArXiv: Computation and Language, 2019.
[58] REN X, WU Z, HE W, et al. CoType: Joint extraction of typed entities and relations with knowledge bases [C]// The Web Conference. 2017: 1015-1024.
[59] TAKANOBU R, ZHANG T, LIU J, et al. A hierarchical framework for relation extraction with reinforcement learning [J]. National Conference on Artificial Intelligence, 2019, 33(1): 7072-7079.
[60] YE W, LI B, XIE R, et al. Exploiting entity BIO tag embeddings and multi-task learning for relation extraction with imbalanced data [C]// Meeting of the Association for Computational Linguistics. 2019: 1351-1360.
[61] GUI Y, LIU Q, ZHU M, et al. Exploring long tail data in distantly supervised relation extraction [C]// LIN C Y, XUE N, ZHAO D, et al. Natural Language Understanding and Intelligent Applications. ICCPOL 2016, NLPCC 2016. Lecture Notes in Computer Science, 2016.
[62] ZHANG N, DENG S, SUN Z, et al. Long-tail relation extraction via knowledge graph embeddings and graph convolution networks [C]// North American Chapter of the Association for Computational Linguistics. 2019: 3016-3025.
[63] HAN X, YU P, LIU Z, et al. Hierarchical relation extraction with coarse-to-fine grained attention [C]// Empirical Methods in Natural Language Processing. 2018: 2236-2245.
[64] MIKOLOV T, CHEN K, CORRADO G S, et al. Efficient estimation of word representations in vector space [C]// International Conference on Learning Representations. 2013.
[65] PENNINGTON J, SOCHER R, MANNING C D, et al. Glove: Global vectors for word representation [C]// Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[66] DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [C]// North American Chapter of the Association for Computational Linguistics. 2019: 4171-4186.
[67] GOODFELLOW I, POUGETABADIE J, MIRZA M, et al. Generative adversarial nets [C]// Neural Information Processing Systems. 2014: 2672-2680.
[68] SALVARIS M, DEAN D, TOK W H, et al. Generative adversarial networks [J]. ArXiv: Machine Learning, 2018: 187-208.
[69] ANDREW A M. Reinforcement learning: An introduction [J]. Kybernetes, 1998, 27(9): 1093-1096.
[70] SUN T, ZHANG C, JI Y, et al. Reinforcement learning for distantly supervised relation extraction [J]. IEEE Access, 2019(7): 98023-98033.
[71] TANG J, QU M, WANG M, et al. LINE: Large-scale information network embedding [C]// The Web Conference. 2015: 1067-1077.
[72] HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780.
[73] BORDES A, USUNIER N, GARCIADURAN A, et al. Translating embeddings for modeling multi-relational data [C]// Neural Information Processing Systems. 2013: 2787-2795.
[74] KIPF T, WELLING M. Semi-supervised classification with graph convolutional networks [C]// International Conference on Learning Representations. 2017.
[75] HENDRICKX I, KIM S N, KOZAREVA Z, et al. SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals [C]// North American Chapter of the Association for Computational Linguistics. 2009: 94-99.
[76] SURDEANU M, GUPTA S, BAUER J, et al. Stanford's distantly-supervised slot-filling system [R]. Stanford, CA: Stanford University, 2011.
[77] JI, HENG, GRISHMAN, RALPH, et al. Overview of the TAC 2010 knowledge base population track [C]// Text Analysis Conference. 2009.
[78] JI H, GRISHMAN R, DANG H. Overview of the TAC2011 knowledge base population track [C]// Text Analysis Conference. 2011.
[79] GAO T, HAN X, ZHU H, et al. FewRel 2.0: Towards more challenging few-shot relation classification [C]// International Joint Conference on Natural Language Processing. 2019: 6249-6254.
[80] XU J, WEN J, SUN X, et al. A discourse-level named entity recognition and relation extraction dataset for Chinese literature text [J]. ArXiv: Computation and Language, 2017.
[81] HAN X, GAO T, YAO Y, et al. OpenNRE: An open and extensible toolkit for neural relation extraction [C]// International Joint Conference on Natural Language Processing. 2019: 169-174.
[82] LIU T, ZHANG X, ZHOU W, et al. Neural relation extraction via inner-sentence noise reduction and transfer learning [C]// Empirical Methods in Natural Language Processing. 2018: 2195-2204.
[83] REN Z, WANG X, ZHANG N, et al. Deep reinforcement learning-based image captioning with embedding reward [C]// Computer Vision and Pattern Recognition. 2017: 1151-1159.
[84] SHANG Y M, HUANG H, MAO X, et al. Are noisy sentences useless for distant supervised relation extraction [C]// National Conference on Artificial Intelligence. 2020.
[85] CAO Z, HIDALGO G, SIMON T, et al. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields [J]. ArXiv: Computer Vision and Pattern Recognition, 2018.
文章导航

/