Journal of East China Normal University(Natural Sc ›› 2018, Vol. 2018 ›› Issue (5): 41-55.doi: 10.3969/j.issn.1000-5641.2018.05.004

Previous Articles     Next Articles

A survey of entity matching algorithms in heterogeneous networks

LI Na, JIN Gang-zeng, ZHOU Xiao-xu, ZHENG Jian-bing, GAO Ming   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2018-07-04 Online:2018-09-25 Published:2018-09-26

Abstract: The continuous integration of Internet, Internet of Things, and cloud computing technologies has been improving digitization across different industries, but it has also introduced increased data fragmentation. Data fragmentation is characterized by mass, heterogeneity, privacy, dependence, and low quality, resulting in poor data availability. As a result, it is often difficult to obtain accurate and complete information for many analytical tasks. To make effective use of data, entity matching, fusion, and disambiguation are of particular significance. In this paper, we summarize data preprocessing, similarity measurements, and entity matching algorithms of heterogeneous networks. In addition, particularly for large datasets, we investigate scalable entity matching algorithms. Existing entity matching algorithms can be categorized into two groups:supervised and unsupervised learning-based algorithms. We conclude the study with research progress on entity matching and topics for future research.

Key words: data fusion, entity matching, record linkage, entity resolution

CLC Number: