J* E* C* N* U* N* S* ›› 2025, Vol. 2025 ›› Issue (5): 140-150.doi: 10.3969/j.issn.1000-5641.2025.05.013

• Open Source Ecosystem: Development and Governance • Previous Articles    

A DTA based activity evaluation method for high star GitHub repositories

Mingdong YOU, Jiaheng PENG, Fanyu HAN, Wei WANG*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2025-07-03 Accepted:2025-07-09 Online:2025-09-25 Published:2025-09-25
  • Contact: Wei WANG E-mail:wwang@dase.ecnu.edu.cn

Abstract:

In the context of identifying GitHub’s long-term active, high-star repositories—critical for assisting the development of robust open-source communities and vital digital infrastructure—we propose a novel method for evaluating the long-term activity of these repositories. This method is firmly based on a time series prediction model, which excels in forecasting repository activity metrics rather than being specifically designed for this purpose. A key innovation of our method is the first-time use of the developer activity cycle as a pivotal feature. This improves the accuracy of predictions for repository development trends and provides a more nuanced understanding of project evolution. After meticulously modeling and mining the time series data of various activity indicators, we developed a new activity calculation formula: development trend-based activity (DTA). This formula allows a precise quantitative evaluation of a repository's true activity level. To rigorously validate our methodology, we designed and curated a comprehensive benchmark dataset with fine time granularity and broad coverage. Subsequently, we systematically evaluated the performance of multiple prediction models against this dataset, eventually identifying the best model for forecasting open-source repository activity. The experimental results conclusively demonstrate the effectiveness of our proposed method in accurately predicting the long-term activity of repositories. Consequently, using DTA to evaluate repository activity can enable open-source participants to effectively identify repositories poised for long-term engagement, strategically determine their participation focus, and thereby significantly promote the sustained development of open-source communities and critical digital infrastructure.

Key words: open-source repository, activity assessment, open-source community

CLC Number: