华东师范大学学报(自然科学版) ›› 2018, Vol. 2018 ›› Issue (5): 41-55.doi: 10.3969/j.issn.1000-5641.2018.05.004

• 综述论文 • 上一篇    下一篇

异构网络中实体匹配算法综述

李娜, 金冈增, 周晓旭, 郑建兵, 高明   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2018-07-04 出版日期:2018-09-25 发布日期:2018-09-26
  • 通讯作者: 郑建兵,男,博士,研究方向为信息处理技术.E-mail:zhengjb@js.chinamobile.com. E-mail:zhengjb@js.chinamobile.com
  • 作者简介:李娜,女,硕士研究生,研究方向为数据挖掘.E-mail:nali0606@foxmail.com.
  • 基金资助:
    国家重点研发计划项目(2016YFB1000905);国家自然科学基金广东省联合重点项目(U1401256);国家自然科学基金(61672234,61502236,61472321);上海市科技兴农推广项目(T20170303)

A survey of entity matching algorithms in heterogeneous networks

LI Na, JIN Gang-zeng, ZHOU Xiao-xu, ZHENG Jian-bing, GAO Ming   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2018-07-04 Online:2018-09-25 Published:2018-09-26

摘要: 互联网、物联网和云计算技术的不断融合,使得各行各业信息化程度越来越高,但同时也带来了数据碎片化的问题.数据碎片化的海量性、异构性、隐私性、相依性和低质性等特征,导致了数据可用性较差,利用这些数据难以挖掘出准确而完整的信息.为了更有效地利用数据,实体匹配、融合和消歧变得尤为重要.主要对异构网络中实体匹配算法进行了综述,对实体相似度度量和数据预处理技术进行了梳理;特别针对海量数据,概述了可扩展实体匹配方法的研究进展,综述了运用监督学习和非监督学习两类技术的实体匹配算法.

关键词: 数据融合, 实体匹配, 记录链接, 实体解析

Abstract: The continuous integration of Internet, Internet of Things, and cloud computing technologies has been improving digitization across different industries, but it has also introduced increased data fragmentation. Data fragmentation is characterized by mass, heterogeneity, privacy, dependence, and low quality, resulting in poor data availability. As a result, it is often difficult to obtain accurate and complete information for many analytical tasks. To make effective use of data, entity matching, fusion, and disambiguation are of particular significance. In this paper, we summarize data preprocessing, similarity measurements, and entity matching algorithms of heterogeneous networks. In addition, particularly for large datasets, we investigate scalable entity matching algorithms. Existing entity matching algorithms can be categorized into two groups:supervised and unsupervised learning-based algorithms. We conclude the study with research progress on entity matching and topics for future research.

Key words: data fusion, entity matching, record linkage, entity resolution

中图分类号: