Journal of East China Normal University(Natural Science) ›› 2021, Vol. 2021 ›› Issue (6): 147-160.doi: 10.3969/j.issn.1000-5641.2021.06.015
• Computer Science • Previous Articles Next Articles
Chenliang GUO1, Xin LIN1,*(), Yue YIN2
Received:
2020-09-18
Online:
2021-11-25
Published:
2021-11-26
Contact:
Xin LIN
E-mail:xlin@cs.ecnu.edu.cn
CLC Number:
Chenliang GUO, Xin LIN, Yue YIN. Unsupervised author name disambiguation based on heterogeneous networks[J]. Journal of East China Normal University(Natural Science), 2021, 2021(6): 147-160.
Table 1
Data of papers by the author Hongbin LIANG"
ID | 关键词 | 机构 | 其他作者列表 |
| adaptive computing allocation, mobile cloud | School of Transportation and Logistics | XING Tianyi; CAI Lin; HUANG Dijiang; PENG Daiyuan; LIU Yan |
| adaptive channel allocation, wireless algorithm | National United Engineering Laboratory of Integrated and Intelligent Transportation School | ZHANG Jin; LI Wei; GULLIVER Aaron |
| network coding DVB-IPDC LTE | Secure Networking and Computing Institute, Arizona State University | WANG Lian; PENG Daiyuan |
| 5-axis machining G code, Interpolation NURBS | College of Mechanical Engineering | LI Xia |
| 5-axis NURBS surfaces STEP-NC | College of Mechanical and Electrical Engineering | LI Xia |
Table 2
Statistics on the dataset"
数据集 | 作者名数量 | 论文数量 | 消歧后作者数量 | 机构缺失的论文数量 | 摘要缺失的论文数量 | 关键词缺失的论文数量 |
AMiner | 600 | 203078 | 39781 | 134 (0.06%) | 3118 (1.53%) | 49132 (24%) |
AMiner训练集 | 500 | 169720 | 33382 | 114 (0.06%) | 2647 (1.56%) | 41299 (24%) |
AMiner测试集 | 100 | 35023 | 6399 | 20 (0.06%) | 488 (1.39%) | 8286 (24%) |
SCI | 13328290 | 18138796 | 14279136 | 830942 (5%) | 4584943 (25%) | 7748879 (43%) |
SCI测试集 | 10 | 184 | 44 | 11 (6%) | 62 (33%) | 81 (44%) |
Table 3
Number of papers corresponding to select authors"
AMiner作者 | 论文数量 | 作者数量 | 人均论文数 | SCI作者 | 论文数量 | 作者数量 | 人均论文数 | |
XU Xu | 699 | 122 | 5.7 | ABBAS Hazzim | 12 | 5 | 2.4 | |
YU Rong | 378 | 82 | 4.6 | AALKJAER Christian | 20 | 1 | 20.0 | |
TIAN Yong | 382 | 98 | 3.9 | ABEL Robert | 16 | 1 | 16.0 | |
HAN Lu | 511 | 129 | 4.0 | AARABI Mahmoud | 11 | 2 | 5.5 | |
HUANG Lin | 749 | 168 | 4.5 | AAMIR Muhammad | 44 | 6 | 7.3 | |
XU Kexin | 220 | 14 | 15.7 | ABE Yuki | 20 | 8 | 2.5 | |
QUAN Wei | 229 | 38 | 6.0 | ABBASI Shawn | 14 | 4 | 3.5 | |
DENG Tao | 398 | 71 | 5.6 | ABE Kazuo | 15 | 2 | 7.5 | |
LI Hongbin | 354 | 65 | 5.4 | ABDULLAH Amin | 19 | 8 | 2.4 | |
BAI Hua | 447 | 72 | 6.2 | ABAB Julia | 13 | 7 | 1.9 | |
CHEN Meiling | 200 | 200 | 5.3 | |||||
WANG Yanqing | 196 | 196 | 3.0 | |||||
ZHANG Xudong | 314 | 314 | 4.6 | |||||
SHI Qiang | 356 | 93 | 3.8 | |||||
ZHENG Min | 654 | 122 | 5.4 |
Table 4
Experimental results of the AMiner test set"
作者名称 | 本文方法 | AMiner方法[ | OAG比赛第一名方法 | 概率模型方法[ | GHOST方法[ | ||||||||||||||
召回率/% | 精确率/% | | 召回率/% | 精确率/% | | 召回率/% | 精确率/% | | 召回率/% | 精确率/% | | 召回率/% | 精确率/% | | |||||
XU Xu | 55.85 | 70.83 | 62.46 | 45.86 | 74.18 | 56.68 | 60.39 | 60.69 | 60.54 | 41.87 | 48.16 | 44.80 | 21.79 | 61.34 | 32.15 | ||||
YU Rong | 40.31 | 85.08 | 54.70 | 46.51 | 89.13 | 61.12 | 53.55 | 71.49 | 61.23 | 40.85 | 65.48 | 50.32 | 36.41 | 92.00 | 52.17 | ||||
TIAN Yong | 62.96 | 80.01 | 70.47 | 51.95 | 76.32 | 61.82 | 63.93 | 39.35 | 48.71 | 56.85 | 70.74 | 63.04 | 54.58 | 86.94 | 67.60 | ||||
HAN Lu | 22.39 | 64.63 | 33.26 | 28.05 | 51.78 | 36.39 | 48.31 | 42.47 | 45.21 | 20.62 | 47.88 | 28.82 | 17.39 | 69.72 | 27.84 | ||||
HUANG Lin | 44.52 | 86.29 | 58.74 | 32.87 | 77.10 | 46.09 | 59.17 | 69.84 | 64.07 | 34.17 | 71.84 | 46.31 | 17.25 | 86.15 | 28.74 | ||||
XU Kexin | 95.12 | 80.71 | 87.32 | 98.64 | 91.37 | 94.87 | 99.65 | 80.46 | 89.04 | 82.47 | 90.02 | 86.08 | 28.52 | 92.90 | 43.64 | ||||
QUAN Wei | 39.78 | 83.81 | 53.95 | 39.02 | 53.88 | 45.26 | 56.55 | 62.09 | 59.19 | 47.66 | 64.45 | 54.77 | 27.80 | 86.42 | 42.07 | ||||
DENG Tao | 42.13 | 80.22 | 54.24 | 43.62 | 81.63 | 56.86 | 54.18 | 60.53 | 57.18 | 29.89 | 53.04 | 38.23 | 24.50 | 73.33 | 36.73 | ||||
LI Hongbin | 70.84 | 89.10 | 78.93 | 69.21 | 77.20 | 72.99 | 72.16 | 56.64 | 63.47 | 53.05 | 54.66 | 53.84 | 29.12 | 56.29 | 38.39 | ||||
BAI Hua | 33.81 | 83.24 | 48.09 | 39.73 | 71.49 | 51.08 | 35.84 | 63.92 | 45.93 | 35.90 | 58.58 | 44.52 | 29.54 | 83.06 | 43.58 | ||||
CHEN Meiling | 45.07 | 92.79 | 60.67 | 44.70 | 74.93 | 55.99 | 49.57 | 54.17 | 51.77 | 28.80 | 59.36 | 38.79 | 23.85 | 86.11 | 37.35 | ||||
WANG Yanqing | 59.32 | 64.81 | 91.95 | 75.33 | 71.52 | 73.37 | 93.58 | 16.07 | 27.43 | 51.97 | 60.40 | 55.87 | 40.39 | 80.79 | 53.86 | ||||
ZHANG Xudong | 7.69 | 61.61 | 13.67 | 22.54 | 62.40 | 33.12 | 10.69 | 43.28 | 17.14 | 23.35 | 70.20 | 35.04 | 7.23 | 85.75 | 13.34 | ||||
SHI Qiang | 47.07 | 53.60 | 50.12 | 36.15 | 52.20 | 42.72 | 51.77 | 47.66 | 49.63 | 36.94 | 43.84 | 40.10 | 26.80 | 53.72 | 35.76 | ||||
ZHENG Min | 11.56 | 45.13 | 18.40 | 22.35 | 57.65 | 32.21 | 16.48 | 26.05 | 20.19 | 19.70 | 54.76 | 28.98 | 15.21 | 80.50 | 25.58 | ||||
15个的平均值 | 45.23 | 74.79 | 56.37 | 46.44 | 70.85 | 56.10 | 50.05 | 52.98 | 51.47 | 40.27 | 60.89 | 48.48 | 26.69 | 78.33 | 39.81 | ||||
100个的平均值 | 65.77 | 75.21 | 70.17 | 63.03 | 77.96 | 67.79 | 73.36 | 60.14 | 66.10 | 59.53 | 70.63 | 62.81 | 57.09 | 77.22 | 50.23 |
Table 5
Experimental results of the SCI test set"
作者名称 | 召回率/% | 精确率/% | |
ABBAS Hazzim | 100.00 | 100.00 | 100.00 |
AALKJAER Christian | 90.00 | 100.00 | 94.74 |
ABEL Robert | 93.69 | 43.24 | 59.18 |
AARABI Mahmoud | 100.00 | 100.00 | 100.00 |
AAMIR Muhammad | 87.50 | 100.00 | 93.33 |
ABE Yuki | 83.33 | 100.00 | 90.91 |
ABBASI Shawn | 100.00 | 100.00 | 100.00 |
ABE Kazuo | 87.76 | 100.00 | 93.48 |
ABDULLAH Amin | 65.22 | 48.39 | 55.56 |
ABAB Julia | 80.00 | 66.67 | 72.73 |
平均值 | 88.75 | 85.83 | 87.27 |
Table 6
Comparison of results when deleting part of the model"
模型类型 | AMiner测试集100个作者 | AMiner训练集500个作者 | AMiner数据平均结果 | ||||||||
召回率/% | 精确率/% | | 召回率/% | 精确率/% | | 召回率/% | 精确率/% | | |||
原始模型 | 65.77 | 75.21 | 70.17 | 64.56 | 70.21 | 67.27 | 64.76 | 71.04 | 67.75 | ||
只用结构特征 | 60.10 | 65.65 | 62.75 | 57.43 | 63.08 | 60.12 | 57.88 | 63.50 | 60.56 | ||
只用文本特征 | 87.65 | 41.04 | 55.90 | 86.71 | 38.41 | 53.24 | 86.87 | 38.85 | 53.69 | ||
去除词形还原 | 61.25 | 77.97 | 68.61 | 59.08 | 75.83 | 66.42 | 59.44 | 76.19 | 66.78 | ||
去除TF-IDF加权 | 62.99 | 75.72 | 68.77 | 62.99 | 69.79 | 66.22 | 62.99 | 70.78 | 66.66 | ||
去除词向量的随机打乱 | 55.92 | 78.76 | 65.40 | 54.49 | 76.46 | 63.63 | 54.73 | 76.84 | 63.92 | ||
去除关键词 | 63.00 | 75.79 | 68.81 | 62.90 | 69.61 | 66.08 | 62.92 | 70.64 | 66.56 | ||
去除来源 | 62.24 | 77.32 | 68.96 | 59.35 | 74.53 | 66.07 | 59.83 | 75.00 | 66.56 | ||
去除摘要 | 61.18 | 77.24 | 68.28 | 58.75 | 75.19 | 65.97 | 59.16 | 75.53 | 66.35 |
Table 7
Results comparison when the variable e is various values"
权值 | 召回率/% | 精确率/% | | 权值 | 召回率/% | 精确率/% | | |
0.5 | 61.50 | 75.73 | 67.88 | 2.8 | 65.42 | 73.32 | 69.15 | |
0.9 | 62.08 | 76.70 | 68.62 | 3.0 | 65.77 | 75.21 | 70.17 | |
1.3 | 63.48 | 77.50 | 69.80 | 3.5 | 66.45 | 71.52 | 68.90 | |
2.0 | 62.91 | 76.62 | 69.09 | 4.0 | 67.33 | 68.89 | 68.10 | |
2.5 | 64.60 | 73.72 | 68.86 | 5.0 | 69.70 | 64.73 | 67.12 |
Table 8
Results comparison when the variables F, b are various values"
阈值 | 召回率/% | 精确率/% | | 路径数 | 召回率/% | 精确率/% | | |
0.5 | 65.60 | 72.45 | 68.86 | 5 | 71.25 | 63.67 | 67.25 | |
1.0 | 65.20 | 72.53 | 68.67 | 10 | 65.77 | 75.21 | 70.17 | |
1.5 | 65.77 | 75.21 | 70.17 | 15 | 62.48 | 75.02 | 68.18 | |
2.0 | 65.24 | 73.01 | 68.90 | 20 | 59.90 | 76.41 | 67.15 | |
2.5 | 65.92 | 72.59 | 69.09 | 25 | 56.40 | 77.59 | 65.32 |
Table 9
Results comparison when the variables d, r are various values"
维数 | 召回率/% | 精确率/% | | 路径长 | 召回率/% | 精确率/% | | |
10 | 66.76 | 61.43 | 63.98 | 10 | 75.25 | 55.19 | 63.68 | |
25 | 73.36 | 58.06 | 64.82 | 25 | 65.77 | 75.21 | 70.17 | |
50 | 70.15 | 63.10 | 66.44 | 35 | 63.15 | 74.40 | 68.31 | |
100 | 65.77 | 75.21 | 70.17 | 50 | 60.12 | 76.07 | 67.16 | |
200 | 66.82 | 72.49 | 67.64 | 100 | 48.53 | 79.18 | 60.17 |
1 | DONG Y, CHAWLA N V, SWAMI A. metapath2vec: Scalable representation learning for heterogeneous networks [C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017: 135-144. |
2 | PEROZZI B, ALRFOU R, SKIENA S. Deepwalk: Online learning of social representations [C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014: 701-710. |
3 |
ROBERTSON S. Understanding inverse document frequency: On theoretical arguments for IDF. Journal of Documentation, 2004, 60 (5): 503- 520.
doi: 10.1108/00220410410560582 |
4 | ZHANG Y, ZHANG F, YAO P, et al. Name disambiguation in AMiner: Clustering, maintenance, and human in the loop [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2018: 1002-1011. |
5 | HAN H, GILES L, ZHA H, et al. Two supervised learning approaches for name disambiguation in author citations [C]// Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries. IEEE, 2004: 296-305. |
6 |
VELOSO A, FERREIRA A A, GONCALVES M A, et al. Cost-effective on-demand associative author name disambiguation. Information Processing and Management, 2012, 48 (4): 680- 697.
doi: 10.1016/j.ipm.2011.08.005 |
7 | YOSHIDA M, IKEDA M, ONO S, et al. Person name disambiguation by bootstrapping [C]// Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010: 10-17. |
8 | HAN X, ZHAO J. Named entity disambiguation by leveraging wikipedia semantic knowledge [C]// Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009: 215-224. |
9 | TANG J, ZHANG J, ZHANG D, et al. A unified framework for name disambiguation [C]// Proceedings of the 17th International Conference on World Wide Web. 2008: 1205-1206. |
10 |
DENG C, DENG H, LI C. A scholar disambiguation method based on heterogeneous relation-fusion and attribute enhancement. IEEE Access, 2020, 8, 28375- 28384.
doi: 10.1109/ACCESS.2020.2972372 |
11 | FAN X, WANG J, PU X, et al. On graph-based name disambiguation. Journal of Data and Information Quality, 2011, 2 (2): 1- 23. |
12 | MALIN B. Unsupervised name disambiguation via social network similarity [C]// Proceedings of the Workshop on Link Analysis, Counterterrorism and Security. 2005: 93-102. |
13 | ZHANG W, YAN Z, ZHENG Y. Author name disambiguation using graph node embedding method [C]// Proceedings of the 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 2019: 410-415. |
14 | ZHANG B, HASAN M A. Name disambiguation in anonymized graphs using network embedding [C]// Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2017: 1239-1248. |
15 | KIM K, ROHATGI S, GILES C L. Hybrid dee pairwise classification for author name disambiguation [C]// Proceedings of the 2019 ACM on Conference on Information and Knowledge Management. 2019: 2369-2372. |
16 |
PENG L, SHEN S, XU J, et al. Diting: An author disambiguation method based on network representation learning. IEEE Access, 2019, 7, 135539- 135555.
doi: 10.1109/ACCESS.2019.2942477 |
17 | PENG L, SHEN S, LI D, et al. Author disambiguation through adversarial network representation learning [C]// International Joint Conference on Neural Networks. 2019: paper N-19712. |
18 | WANG H, WANG R, WEN C, et al. Author name disambiguation on heterogeneous information network with adversarial representation learning [C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2020: 238-245. |
19 | QIAO Z, DU Y, FU Y, et al. Unsupervised author disambiguation using heterogeneous graph convolutional network embedding [C]// Proceedings of the 2019 IEEE International Conference on Big Data. IEEE, 2019: 910-919. |
20 | WANG X, TANG J, CHENG H, et al. ADANA: Active name disambiguation [C]// 2011 11th IEEE International Conference on Data Mining. IEEE, 2011: 794-803. |
21 | NG V. Machine learning for entity coreference resolution: A retrospective look at two decades of research [C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2017: 4877–4884. |
22 | TANG X, ZHANG J, CHEN B, et al. BERT-INT: A BERT-based interaction model for knowledge graph alignment [C]// Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. 2020: 3174-3180. |
[1] | YE Shi-tong, WAN Zhi-ping, KE Jian-bo, LIU Shao-jiang, NI Wei-chuan. Cognitive heterogeneous network based on cooperative spectrum sensing and interference constraints [J]. Journal of East China Normal University(Natural Sc, 2017, 2017(6): 76-84. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||