[1] WU X, ZHU X, WU G Q, et al. Data mining with big data[J]. IEEE transactions on knowledge and data engineering, 2014, 26(1):97-107.
[2] 赵国栋. 大数据时代的历史机遇:产业变革与数据科学[M]. 北京:清华大学出版社, 2013.
[3] GU B, LI Z, ZHANG X, et al. The interaction between schema matching and record matching in data integration[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1):186-199.
[4] LI C, JIN L, MEHROTRA S. Supporting efficient record linkage for large data sets using mapping techniques[J]. World Wide Web, 2006, 9(4):557-584.
[5] WHANG S E, GARCIA-MOLINA H. Incremental entity resolution on rules and data[J]. The VLDB Journal, 2014, 23(1):77-102.
[6] GETOOR L, MACHANAVAJJHALA A. Entity resolution:theory, practice & open challenges[J]. Proceedings of the VLDB Endowment, 2012, 5(12):2018-2019.
[7] TEJADA S, KNOBLOCK C A, Minton S. Learning Object Identification Rules for Information Integration[J]. Information Systems, 2001, 26(8):607-633.
[8] LEITAO L, CALADO P, HERSCHEL M. Efficient and effective duplicate detection in hierarchical data[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(5):1028-1041.
[9] DUNN H L. Record linkage[J]. American Journal of Public Health and the Nations Health, 1946, 36(12):1412-1416.
[10] LIU S, WANG S, ZHU F, et al. Hydra:Large-scale social identity linkage via heterogeneous behavior modeling[C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 2014:51-62.
[11] LIU J, LEI K H, LIU J Y, et al. Ranking-based name matching for author disambiguation in bibliographic data[C]//Proceedings of the 2013 KDD Cup 2013 Workshop. ACM, 2013:Article No 8.
[12] KONG C, GAO M, XU C, et al. Entity matching across multiple heterogeneous data sources[C]//International Conference on Database Systems for Advanced Applications. Cham:Springer, 2016:133-146.
[13] KEJRIWAL M, MIRANKER D P. Semi-supervised instance matching using boosted classifiers[C]//European Semantic Web Conference. Cham:Springer, 2015:388-402.
[14] KONDA P, DAS S, SUGANTHAN GC P, et al. Magellan:Toward building entity matching management systems[J]. Proceedings of the VLDB Endowment, 2016, 12(9):1197-1208.
[15] WHANG S E, MENESTRINA D, KOUTRIKA G, et al. Entity resolution with iterative blocking[C]//Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, 2009:219-232.
[16] KARAPIPERIS D, VERYKIOS V S. An LSH-based blocking approach with a homomorphic matching technique for privacy-preserving record linkage[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(4):909-921.
[17] GRUENHEID A, DONG X L, SRIVASTAVA D. Incremental record linkage[J]. Proceedings of the VLDB Endowment, 2014, 9(7):697-708.
[18] KIM H, LEE D. HARRA:Fast iterative hashed record linkage for large-scale data collections[C]//Proceedings of the 13th International Conference on Extending Database Technology. ACM, 2010:525-536.
[19] ZHANG Y, TANG J, YANG Z, et al. Cosnet:Connecting heterogeneous social networks with local and global consistency[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015:1485-1494.
[20] CHRISTEN P. A survey of indexing techniques for scalable record linkage and deduplication[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(9):1537-1555.
[21] WANG J, LI G, YU J X, et al. Entity matching:How similar is similar[J]. Proceedings of the VLDB Endowment, 2011, 10(4):622-633.
[22] SONG Y, KIMURA T, BATJARGAL B, et al. Cross-language record linkage using word embedding driven metadata similarity measurement[C/OL]//International Semantic Web Conference (Posters & Demos). (2016-09-28)[2018-06-01]. http://ceur-ws.org/vol-1690/paper90.pdf.
[23] YANG K H, PENG H T, JIANG J Y, et al. Author name disambiguation for citations using topic and web correlation[C]//International Conference on Theory and Practice of Digital Libraries. Berlin:Springer, 2008:185-196.
[24] VAN Der LOO M P J. The stringdist package for approximate string matching[J]. The R Journal, 2014, 6(1):111-122.
[25] LI C L, SU Y C, LIN T W, et al. Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013[C]//Proceedings of the 2013 KDD Cup 2013 Workshop. ACM, 2013:Article No 2.
[26] PELED O, FIRE M, ROKACH L, et al. Entity matching in online social networks[C]//Social Computing (SocialCom), 2013 International Conference on. IEEE, 2013:339-344.
[27] LI J, LIANG X, DING W, et al. Feature engineering and tree modeling for author-paper identification challenge[C]//Proceedings of the 2013 KDD Cup 2013 Workshop. ACM, 2013:Article No 5.
[28] PELED O, FIRE M, ROKACH L, et al. Matching entities across online social networks[J]. Neurocomputing, 2016, 210(C):91-106.
[29] TANG J, FONG A C M, WANG B, et al. A unified probabilistic framework for name disambiguation in digital library[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6):975-987.
[30] KOUDAS N, SARAWAGI S, SRIVASTAVA D. Record linkage:similarity measures and algorithms[C]//Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. ACM, 2006:802-803.
[31] VESDAPUNT N, GARCIA-MOLINA H. Identifying users in social networks with limited information[C]//Data Engineering (ICDE), 2015 IEEE 31st International Conference on. IEEE, 2015:627-638.
[32] LEE J Y, HUSSAIN R, RIVERA V, et al. Second-level degree-based entity resolution in online social networks[J/OL]. Social Network Analysis and Mining, 2018, 8:19. (2018-03-16)[2018-06-01]. https://link.springer.com/content/pdf/10.1007%2Fs13278-018-0499-9.pdf.
[33] ZHOU X, LIANG X, ZHANG H, et al. Cross-platform identification of anonymous identical users in multiple social media networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(2):411-424.
[34] ZHANG J, YU P S, ZHOU Z H. Meta-path based multi-network collective link prediction[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2014:1286-1295.
[35] LEVIN F H, HEUSER C A. Evaluating the use of social networks in author name disambiguation in digital libraries[J]. Journal of Information and Data Management, 2010(2):183-197.
[36] POOJA K M, MONDAL S, CHANDRA J. An unsupervised heuristic based approach for author name disambiguation[C]//Communication Systems and Networks (COMSNETS), 201810th International Conference on. IEEE, 2018:540-542.
[37] ZHONG E, LI L, WANG N, et al. Contextual rule-based feature engineering for author-paper identification[C]//Proceedings of the 2013 KDD Cup 2013 Workshop. ACM, 2013:Article No 6.
[38] PEROZZI B, AL-RFOU r, SKIENA S. Deepwalk:Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2014:701-710.
[39] GROVER A, LESKOVEC J. node2vec:Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2016:855-864.
[40] TANG J, QU M, WANG M, et al. Line:Large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web, International World Wide Web, Conferences Steering Committee, 2015:1067-1077.
[41] ZHOU X, LIANG X, DU X, et al. Structure based user identification across social networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(6):1178-1191.
[42] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint, arXiv:1301.3781v3[cs.CV] 7 Sep 2013.
[43] LIU S, WANG S, ZHU F. Structured learning from heterogeneous behavior for social identity linkage[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(7):2005-2019.
[44] DEY D. Entity matching in heterogeneous databases:A logistic regression approach[J]. Decision Support Systems, 2008, 44(3):740-747.
[45] ZAFARANI R, LIU H. Connecting users across social media sites:a behavioral-modeling approach[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2013:41-49.
[46] ZHOU Y, HOWROYD J, DANICIC S, et al. Extending naive bayes classifier with hierarchy feature level information for record linkage[C]//Workshop on Advanced Methodologies for Bayesian Networks. Cham:Springer, 2015:93-104.
[47] LI L, LI J, GAO H. Rule-based method for entity resolution[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(1):250-263.
[48] CHIN W S, ZHUANG Y, JUAN Y C, et al. Effective string processing and matching for author disambiguation[J]. The Journal of Machine Learning Research, 2014, 15(1):3037-3064.
[49] JIANG Y, LIN C, MENG W, et al. Rule-based deduplication of article records from bibliographic databases[J]. Database, 2014:Article ID bat086. DOI:10.1093/database/bat086.
[50] SETOGUCHI S, ZHU Y, JALBERT J J, et al. Validity of deterministic record linkage using multiple indirect personal identifiers:linking a large registry to claims data[J]. Circulation:Cardiovascular Quality and Outcomes, 2014, 7(3):475-480.
[51] EKTEFA M, JABAR M A, Sidi F, et al. A threshold-based similarity measure for duplicate detection[C]//Open Systems (ICOS), 2011 IEEE Conference on. IEEE, 2011:37-41.
[52] MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967:281-297.
[53] MONIKA S, ANAND C, GNANAMURTHY R K. Analyzing the User Profile Linkage across Different Social Network Platforms[J]. International Journal of Computer Science and Engineering, 2016, 4(2):1378-1383.
[54] CEN L, DRAGUT E C, SI L, et al. Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion[C]//Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2013:741-744.
[55] SONG Y, HUANG J, COUNCILL I G, et al. Efficient topic-based unsupervised name disambiguation[C]//Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 2007:342-351.
[56] QIAN Y, HU Y, CUI J, et al. Combining machine learning and human judgment in author disambiguation[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, 2011:1241-1246.
[57] FELLEGI I P, SUNTER A B. A theory for record linkage[J]. Journal of the American Statistical Association, 1969, 64(328):1183-1210.
[58] WINKLER W E. Using the EM algorithm for weight computation in the Fellegi-Sunter model of record linkage[C]//Proceedings of the Section on Survey Research Methods, American Statistical Association. 1988:667-671.
[59] BOYD S, VANDENBERGHE L. Convex Optimization[M]. Cambridge:Cambridge University Press, 2004.
[60] GAO M, LIM E P, LO D, et al. CNL:Collective network linkage across heterogeneous social platforms[C]//IEEE International Conference on Data Mining. IEEE Computer Society, 2015:757-762.
[61] YAKOUT M, ELMAGARMID A K, Elmeleegy H, et al. Behavior based record linkage[J]. Proceedings of the VLDB Endowment, 2010, 3(1/2):439-448.
[62] FANG Y, CHANG M W. Entity linking on microblogs with spatial and temporal signals[J]. Transactions of the Association for Computational Linguistics, 2014, 2(1):259-272.
[63] HAN H, XU W, ZHA H, et al. A hierarchical naive Bayes mixture model for name disambiguation in author citations[C]//Proceedings of the 2005 ACM Symposium on Applied Computing. ACM, 2005:1065-1069.
[64] SINGLA P, DOMINGOS P. Collective object identification[C]//International Joint Conference on Artificial Intelligence.[S.l.]. Morgan Kaufmann Publishers Inc, 2005:1636-1637.
[65] BALAJI J, MIN C, JAVED F, et al. Avatar:Large scale entity resolution of heterogeneous user profiles[C]//Proceedings of the 2nd Workshop on Data Management for End-To-End Machine Learning. ACM, 2018:Article No 2.
[66] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine Learning Research, 2003(3):993-1022.
[67] ZHAO W X, JIANG J, WENG J, et al. Comparing twitter and traditional media using topic models[C]//European Conference on Information Retrieval. Berlin:Springer, 2011:338-349.
[68] YANG Y, SUN Y, TANG J, et al. Entity matching across heterogeneous sources[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015:1395-1404.
[69] BHATTACHARYA I, GETOOR L. A latent dirichlet model for unsupervised entity resolution[C]//Proceedings of the 2006 SIAM International Conference on Data Mining,[S.l.]:Society for Industrial and Applied Mathematics, 2006:47-58.
[70] SEN P. Collective context-aware topic models for entity disambiguation[C]//Proceedings of the 21st International Conference on World Wide Web. ACM, 2012:729-738.
[71] TANG J, FANG Z, SUN J. Incorporating social context and domain knowledge for entity recognition[C]//Proceedings of the 24th International Conference on World Wide Web.[S.l.]:International World Wide Web Conferences Steering Committee, 2015:517-526.
[72] STEORTS R C, HALL R, FIENBERG S E. A bayesian approach to graphical record linkage and deduplication[J]. Journal of the American Statistical Association, 2016, 516(111):1660-1672.
[73] MAURYA A, TELANG R. Bayesian multi-view models for member-job matching and personalized skill recommendations[C]//Big Data, 2017 IEEE International Conference on. IEEE, 2017:1193-1202.