Journal of East China Normal University(Natural Science) ›› 2021, Vol. 2021 ›› Issue (5): 14-23.doi: 10.3969/j.issn.1000-5641.2021.05.002

• Financial Knowledge Graph • Previous Articles     Next Articles

Data augmentation technology for named entity recognition

Xiaoqin MA1(), Xiaohe GUO1, Yufeng XUE1, Lin YANG2,*(), Yuanzhe CHEN3   

  1. 1. Information and Communication Company, State Grid Qinghai Electric Power Company, Xining 810008, China
    2. Shanghai Development Center of Computer Software Technology, Shanghai 201112, China
    3. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2021-08-24 Online:2021-09-25 Published:2021-09-28
  • Contact: Lin YANG;


A named entity recognition task is as a task that involves extracting instances of a named entity from continuous natural language text. Named entity recognition plays an important role in information extraction and is closely related to other information extraction tasks. In recent years, deep learning methods have been widely used in named entity recognition tasks; the methods, in fact, have achieved a good performance level. The most common named entity recognition models use sequence tagging, which relies on the availability of a high quality annotation corpus. However, the annotation cost of sequence data is high; this leads to the use of small training sets and, in turn, seriously limits the final performance of named entity recognition models. To enlarge the size of training sets for named entity recognition without increasing the associated labor cost, this paper proposes a data augmentation method for named entity recognition based on EDA, distant supervision, and bootstrap. Using experiments on the FIND-2019 dataset, this paper illustrates that the proposed data augmentation techniques and combinations thereof can significantly improve the overall performance of named entity recognition models.

Key words: named entity recognition, data augmentation, EDA, distant supervision, Bootstrap

CLC Number: