J* E* C* N* U* N* S* ›› 2025, Vol. 2025 ›› Issue (5): 1-13.doi: 10.3969/j.issn.1000-5641.2025.05.001
• AI-Enabled Open Source Technologies and Applications •
Sijia ZHAO, Fanyu HAN, Wei WANG*()
Received:
2025-01-15
Online:
2025-09-25
Published:
2025-09-25
Contact:
Wei WANG
E-mail:wwang@dase.ecnu.edu.cn
CLC Number:
Sijia ZHAO, Fanyu HAN, Wei WANG. Research on the GitHub developer geographic location prediction method based on multi-dimensional feature fusion[J]. J* E* C* N* U* N* S*, 2025, 2025(5): 1-13.
Table 2
Performance of the different models in geographic prediction"
模型 | 准确率 | 精确率 | 召回率 | F1值 |
XGBoost[ | ||||
LSTM[ | ||||
BiLSTM[ | ||||
Gradient Boosting[ | ||||
CatBoost[ | ||||
随机森林[ | ||||
逻辑回归[ | ||||
Extra Trees[ | ||||
集成模型 |
Table 4
Prompt policy"
提示词 |
Predict the country of residence for this GitHub user. Use all available profile information to make an informed geographical assessment.Be specific and evidence-based. If multiple countries are equally likely, choose the most probable based on strongest indicators.: **User Profile:** - Company/Organization: {company} - Bio/Description: {bio} - Username: {username} - Email domain: {email_domain} **Output Format:** Country: [country name] |
1 | DABBISH L, STUART C, TSAY J, et al. Social coding in GitHub: Transparency and collaboration in an open software repository [C]// Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work. 2012: 1277-1286. |
2 | WACHS J, NITECKI M, SCHUELLER W, et al.. The geography of open source software: evidence from GitHub. Technological Forecasting and Social Change, 2022, 176, 121478. |
3 | NAGLE F.. Open source software and firm productivity. Management Science, 2019, 65 (3): 1191- 1215. |
4 | WRIGHT N L, NAGLE F, GREENSTEIN S.. Open source software and global entrepreneurship. Research Policy, 2023, 52 (9): 104846. |
5 | RUSK D, COADY Y. Location-based analysis of developers and technologies on GitHub [C]// 2014 28th International Conference on Advanced Information Networking and Applications Workshops. IEEE, 2014: 681-685. |
6 | ALBUSAYS K, BJORN P, DABBISH L, et al.. The diversity crisis in software development. IEEE Software, 2021, 38 (2): 19- 25. |
7 | MAY A, WACHS J, HANNÁK A.. Gender differences in participation and reward on Stack Overflow. Empirical Software Engineering, 2019, 24, 1997- 2019. |
8 | PRANA G A A, FORD D, RASTOGI A, et al.. Including everyone, everywhere: Understanding opportunities and challenges of geographic gender-inclusion in OSS. IEEE Transactions on Software Engineering, 2021, 48 (9): 3394- 3409. |
9 | DAHLANDER L, GANN D M, WALLIN M W.. How open is innovation? A retrospective and ideas forward. Research Policy, 2021, 50 (4): 104218. |
10 | TAKHTEYEV Y. Coding Places: Software Practice in a South American City [M]. Cambridge MA, USA: MIT Press, 2012. |
11 | SHAIKH M, VAAST E.. Folding and unfolding: Balancing openness and transparency in open source communities. Information Systems Research, 2016, 27 (4): 813- 833. |
12 | ZHANG S, ZHENG D Q, HU X C, et al. Bidirectional long short-term memory networks for relation classification [C]// Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. 2015: 73-78. |
13 | 袁霖, 王怀民, 尹刚, 等.. 开源环境下开发人员行为特征挖掘与分析. 计算机学报, 2010, 33 (10): 1910- 1918. |
14 | FACKLER T, LAURENTSYEVA N. Gravity in online collaborations: evidence from GitHub [C]// CESifo Forum, 2020, 21(3): 15-20. |
15 | LIMA A, ROSSI L, MUSOLESI M. Coding together at scale: GitHub as a collaborative social network [C]// Proceedings of the International AAAI Conference on Web and Social Media. 2014, 8(1): 295-304. |
16 | NADRI R, RODRÍGUEZ-PÉREZ G, NAGAPPAN M.. On the relationship between the developer’s perceptible race and ethnicity and the evaluation of contributions in OSS. IEEE Transactions on Software Engineering, 2021, 48 (8): 2955- 2968. |
17 | RASTOGI A, NAGAPPAN N, GOUSIOS G, et al. Relationship between geographical location and evaluation of developer contributions in GitHub [C]// Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 2018: 22. |
18 | RASTOGI A. Do biases related to geographical location influence work-related decisions in GitHub? [C]// Proceedings of the 38th International Conference on Software Engineering Companion. 2016: 665-667. |
19 | QUERCIA D, CAPRA L, CROWCROFT J. The social world of Twitter: Topics, geography, and emotions [C]// Proceedings of the International AAAI Conference on Web and Social Media. 2012, 6(1): 298-305. |
20 | JAIDKA K, GIORGI S, SCHWARTZ H A, et al.. Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods. Proceedings of the National Academy of Sciences, 2020, 117 (19): 10165- 10171. |
21 | KOLLANYI B.. Where do bots come from? An analysis of bot codes shared on GitHub. International Journal of Communication, 2016, 10, 4932- 4951. |
22 | KOMOSNY D, MEHIC M.. The value of geographic locations submitted by Internet users. IEEE Access, 2018, 6, 62699- 62706. |
23 | HAN B, COOK P, BALDWIN T.. Text-based twitter user geolocation prediction. Journal of Artificial Intelligence Research, 2014, 49, 451- 500. |
24 | ROSSI D, ZACCHIROLI S. Geographic diversity in public code contributions: an exploratory large-scale study over 50 years [C]// Proceedings of the 19th International Conference on Mining Software Repositories. 2022: 80-85. |
25 | LE TOURNEAU T, LATENDRESSE J, ABDELLATIF A, et al. Code mapper: Mapping the Global contributions of OSS [C]// Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings. 2024: 44-48. |
26 | XIA X Y, ZHAO S Y, HAN F Y, et al. OpenDigger: Data mining and information service system for open collaboration digital ecosystem [EB/OL]. (2023-11-26)[2025-01-10]. https://doi.org/10.48550/arXiv.2311.15204. |
27 | Forebears [EB/OL]. [2025-01-10]. https://forebears.io/about/name-distribution-and-demographics. |
28 | CLAES M, MÄNTYLÄ M V, KUUTILA M, et al. Do programmers work at night or during the weekend? [C]// Proceedings of the 40th International Conference on Software Engineering. 2018: 705-715. |
29 | ZHAO S Y, XIA X Y, FITZGERALD B, et al. OpenRank leaderboard: Motivating open source collaborations through social network evaluation in Alibaba [C]// Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice. 2024: 346-357. |
30 | BREIMAN L.. Random forests. Machine Learning, 2001, 45, 5- 32. |
31 | FRIEDMAN J H.. Stochastic gradient boosting. Computational Statistics & Data Analysis, 2002, 38 (4): 367- 378. |
32 | CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system [C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 785-794. |
33 | HOCHREITER S, SCHMIDHUBER J.. Long short-term memory. Neural Computation, 1997, 9 (8): 1735- 1780. |
34 | DOROGUSH A V, ERSHOV V, GULIN A. CatBoost: Gradient boosting with categorical features support [EB/OL]. (2018-10-24) [2025-01-10]. https://doi.org/10.48550/arXiv.1810.11363. |
35 | LaVALLEY M P.. Logistic regression. Circulation, 2008, 117 (18): 2395- 2399. |
36 | GEURTS P, ERNST D, WEHENKEL L.. Extremely randomized trees. Machine Learning, 2006, 63, 3- 42. |
[1] | Luping CAO, Yong XIA. Detection of high-resolution high-bandwidth fractional vortex beams under atmospheric turbulence [J]. J* E* C* N* U* N* S*, 2025, 2025(3): 51-60. |
[2] | Caidie HUANG, Xinping WANG, Liangyu CHEN, Yong LIU. Research on a knowledge tracking model based on the stacked gated recurrent unit residual network [J]. Journal of East China Normal University(Natural Science), 2022, 2022(6): 68-78. |
[3] | Yilin MA, Huiling TAO, Qiwen DONG, Ye WANG. Prediction of remaining useful life of aeroengines based on the Transformer with multi-feature fusion [J]. Journal of East China Normal University(Natural Science), 2022, 2022(5): 219-232. |
[4] | Zejie WANG, Chaomin SHEN, Chun ZHAO, Xinmei LIU, Jie CHEN. Recognition of classroom learning behaviors based on the fusion of human pose estimation and object detection [J]. Journal of East China Normal University(Natural Science), 2022, 2022(2): 55-66. |
[5] | Bo LIU, Xiaodong BAI, Gengxin ZHANG, Jun SHEN, Jidong XIE, Laiding ZHAO, Tao HONG. Review of deep learning in cognitive radio [J]. Journal of East China Normal University(Natural Science), 2021, 2021(1): 36-52. |
[6] | ZHANG Xu, HUANG Dingjiang. Defect detection on aluminum surfaces based on deep learning [J]. Journal of East China Normal University(Natural Science), 2020, 2020(6): 105-114. |
[7] | HAN Chengcheng, LI Lei, LIU Tingting, GAO Ming. Approaches for semantic textual similarity [J]. Journal of East China Normal University(Natural Science), 2020, 2020(5): 95-112. |
[8] | YANG Kang, HANG Ding-jiang, GAO Ming. A review of machine reading comprehension for automatic QA [J]. Journal of East China Normal University(Natural Sc, 2019, 2019(5): 36-52. |
[9] | CHEN Yuan-zhe, KUANG Jun, LIU Ting-ting, GAO Ming, ZHOU Ao-ying. A survey on coreference resolution [J]. Journal of East China Normal University(Natural Sc, 2019, 2019(5): 16-35. |
[10] | LIU Heng-yu, ZHANG Tian-cheng, WU Pei-wen, YU Ge. A review of knowledge tracking [J]. Journal of East China Normal University(Natural Sc, 2019, 2019(5): 1-15. |
[11] | YE Jian, ZHAO Hui. A public opinion analysis model based on Danmu data monitoring and sentiment classification [J]. Journal of East China Normal University(Natural Sc, 2019, 2019(3): 86-100. |
[12] | YUAN Pei-sen, ZHANG Yong, LI Mei-ling, GU Xing-jian. Research on trademark image retrieval based on deep Hashing [J]. Journal of East China Normal University(Natural Sc, 2018, 2018(5): 172-182. |
[13] | YU Ruo-nan, HUANG Ding-jiang, DONG Qi-wen. Survey on scene text detection based on deep learning [J]. Journal of East China Normal University(Natural Sc, 2018, 2018(5): 1-16. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||