J* E* C* N* U* N* S* ›› 2025, Vol. 2025 ›› Issue (5): 1-13.doi: 10.3969/j.issn.1000-5641.2025.05.001

• AI-Enabled Open Source Technologies and Applications •    

Research on the GitHub developer geographic location prediction method based on multi-dimensional feature fusion

Sijia ZHAO, Fanyu HAN, Wei WANG*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2025-01-15 Online:2025-09-25 Published:2025-09-25
  • Contact: Wei WANG E-mail:wwang@dase.ecnu.edu.cn

Abstract:

The geographic location information of developers is important for understanding the global distribution of open source activities and formulating regional policies. However, a substantial number of developer accounts on the GitHub platform lack geographic location information, limiting the comprehensive analysis of the geographic distribution of the global open source ecosystem. This study proposed a hierarchical geographic location prediction framework based on multidimensional feature fusion. By integrating three major categories of multidimensional features—temporal behavior, linguistic culture, and network characteristics—the framework established a four-tier progressive prediction mechanism consisting of rule-driven rapid positioning, name cultural inference, time zone cross-validation, and a deep learning ensemble. Experiments conducted on a large-scale dataset built from 50000 globally active developers demonstrated that this method successfully predicted the geographic locations of 82.52% of the developers. Among these, the name cultural inference layer covered most users with an accuracy of 0.7629, whereas the deep learning ensemble layer handled the most complex cases with an accuracy of 0.7557. A comparative analysis with the prediction results from the Moonshot large language model validated the superiority of the proposed method in complex geographic inference tasks.

Key words: GitHub, multi-dimensional feature, deep learning, geographic location prediction

CLC Number: