Journal of East China Normal University(Natural Science) ›› 2021, Vol. 2021 ›› Issue (6): 147-160.doi: 10.3969/j.issn.1000-5641.2021.06.015

• Computer Science • Previous Articles    

Unsupervised author name disambiguation based on heterogeneous networks

Chenliang GUO1, Xin LIN1,*(), Yue YIN2   

  1. 1. School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
    2. Shanghai Technology Development Co., Ltd., Shanghai 200031, China
  • Received:2020-09-18 Online:2021-11-25 Published:2021-11-26
  • Contact: Xin LIN E-mail:xlin@cs.ecnu.edu.cn

Abstract:

Author name disambiguation is an important step in constructing an academic knowledge graph. The issue of ambiguous names is widely prevalent in academic literature due to the presence of missing data, ambiguous names, or abbreviations. This paper proposes an unsupervised author name disambiguation method, based on heterogenous networks, with the goal of addressing the problems associated with inadequate information utilization and cold-start; the proposed method automatically learns the features of papers with the ambiguous authors’ name. As a starting point, the method preprocesses strings of authors, organizations, titles, and keywords by lemmatization. The algorithm then learns the embedded representation of text features by the word2vec and TF-IDF methods and learns the embedded representation of structural features using the meta-path random walk and word2vec methods. After merging features by similarity of structure and text, disambiguation is done by a DBSCAN clustering algorithm and merging isolated papers. Experimental results show that the proposed model significantly outperforms existing models in a small dataset and in engineering applications for cold-start unsupervised author name disambiguation. The data indicates that the model is effective and can be implemented in real-world applications.

Key words: author disambiguation, academic knowledge graph, heterogeneous network, meta-path random walk

CLC Number: