华东师范大学学报(自然科学版) ›› 2025, Vol. 2025 ›› Issue (5): 170-182.doi: 10.3969/j.issn.1000-5641.2025.05.016

• 开源生态发展与治理 • 上一篇    

OSS Insight: 开源生态时空数据分析和智能洞察平台

陈小伟1(), 王伟1,*(), 韩凡宇1, 包光磊2, 董菲2, 霍昊2, 刘辰2   

  1. 1. 华东师范大学 数据科学与工程学院, 上海 200062
    2. 平凯星辰(北京)科技有限公司, 北京 100192
  • 收稿日期:2025-01-22 出版日期:2025-09-25 发布日期:2025-09-25
  • 通讯作者: 王伟 E-mail:wayne.chen@stu.ecnu.edu.cn;wwang@dase.ecnu.edu.cn
  • 作者简介:陈小伟, 男, 博士研究生, 研究方向为开源治理. E-mail: wayne.chen@stu.ecnu.edu.cn

OSS Insight: A platform for open source ecosystem spatiotemporal data analysis and insights

Xiaowei CHEN1(), Wei WANG1,*(), Fanyu HAN1, Guanglei BAO2, Fei DONG2, Hao HUO2, Chen LIU2   

  1. 1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
    2. PingCAP, Beijing 100192, China
  • Received:2025-01-22 Online:2025-09-25 Published:2025-09-25
  • Contact: Wei WANG E-mail:wayne.chen@stu.ecnu.edu.cn;wwang@dase.ecnu.edu.cn

摘要:

为更好地利用海量开源生态数据为社区开发和协作提供洞察, 开发了OSS Insight. 其系统架构和查询引擎的创新在于, 利用HTAP(Hybrid Transactional Analytical Processing)数据库高效存储和查询GitHub数十亿事件数据, 通过前端可视化实时生成洞察; 其时空数据的深度挖掘在于, 基于事件时间序列和开发者地理信息, 对开发者行为模式和开源生态演变进行建模分析; 其与LLM(Large Language Model)集成应用Data Explorer, 利用LLM将自然语言查询自动转换为SQL(Structured Query Language), 实现了对开源数据的智能问答和趋势洞察. Kubernetes案例的实证研究从开发者洞察、项目演进和组织协作这3方面进行了开源洞察分析. 实验表明, OSS Insight能够对超大规模开源数据进行高效、全面的分析, 其LLM驱动的交互式探索降低了数据分析门槛, 可辅助用户进行数据洞察, 可为开源社区治理提供实用的分析工具.

关键词: 开源生态, 开源洞察, 时空数据分析, HTAP, LLM

Abstract:

An open source ecosystem abounds with valuable data, yet extracting insights requires innovative data infrastructure and analytical methods. To address this, OSS Insight was developed that innovatively used the hybrid transactional analytical processing(HTAP) database for efficient storage and query of billions of GitHub event data and offered real-time exploration via a visual interface. It delved into spatiotemporal data analysis, modeling developer behaviors and ecosystem evolution, such as visualizing global contribution patterns. Integrated with large language models(LLMs), it enabled natural language to structured query language(SQL) conversion for intelligent querying. A case study of Kubernetes showcased its capabilities in analyzing developers, project evolution, and organizational collaboration. Experiments proved that OSS Insight efficiently analyzed large-scale open source data, and its LLM-driven interaction simplified data analysis and provided automated insights.

Key words: open source ecosystem, open source insight, spatiotemporal data analysis, HTAP, LLM

中图分类号: