Journal of East China Normal University(Natural Science) ›› 2020, Vol. 2020 ›› Issue (5): 113-130.doi: 10.3969/j.issn.1000-5641.202091006

• Semantic Extraction from Data • Previous Articles     Next Articles

Relation extraction via distant supervision technology

WANG Jianing1, HE Yi2, ZHU Renyu1, LIU Tingting1, GAO Ming1   

  1. 1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China;
    2. Shanghai Municipal Big Data Center, Shanghai 200072, China
  • Received:2020-08-07 Published:2020-09-24

Abstract: Relation extraction is one of the classic natural language processing tasks that has been widely used in knowledge graph construction and completion, knowledge base question answering, and text summarization. It aims to extract the semantic relation from a target entity pair. In order to construct a large-scale supervised corpus efficiently, a distant supervision method was proposed to realize automatic annotation by aligning the text with the existing knowledge base. However, it highlights a series of challenges as a result of over-strong assumptions and, accordingly, has attracted the attention of researchers. Firstly, this paper introduces the theories of distant supervision relation extraction and the corresponding formal descriptions. Secondly, we systematically analyze related methods and their respective pros and cons from three perspectives: noisy data, insufficient information, and data imbalance. Next, we explain and compare some benchmark corpus and evaluation metrics. Lastly, we highlight new subsequent challenges for distant supervision relation extraction and discuss trends and directions of future research before concluding.

Key words: relation extraction, distant supervision, natural language processing, knowledge graph, noise processing

CLC Number: