华东师范大学学报(自然科学版) ›› 2025, Vol. 2025 ›› Issue (5): 87-98.doi: 10.3969/j.issn.1000-5641.2025.05.009

• 开源与人工智能在教育中的创新实践 • 上一篇    下一篇

基于学生开源社区行为的数字岗位就业预测

谢林娜, 陆雪松*()   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2025-06-27 接受日期:2025-08-06 出版日期:2025-09-25 发布日期:2025-09-25
  • 通讯作者: 陆雪松 E-mail:xslu@dase.ecnu.edu.cn
  • 基金资助:
    国家自然科学基金 (62277017)

Student employment prediction for digital jobs based on behavior in open-source communities

Linna XIE, Xuesong LU*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2025-06-27 Accepted:2025-08-06 Online:2025-09-25 Published:2025-09-25
  • Contact: Xuesong LU E-mail:xslu@dase.ecnu.edu.cn

摘要:

预测学生职业走向对于高校人才培养与企业招聘策略具有重要意义. 现有学生的就业预测大多依赖在校期间的学业数据或者其他校园行为数据, 忽略了学生开源贡献行为在数字类岗位求职过程中的作用. 为此, 基于学生的开源社区行为数据, 开展了面向数字岗位的就业预测研究. 具体而言, 文章构建了一个包含学生、代码仓库及其多种语义关系的异构信息网络来提取学生的技术特征, 并探索了两类融合大语言模型 (Large Language Model, LLM) 与图神经网络 (Graph Neural Network, GNN) 的建模策略, 大语言模型分别作为编码器 (LLM-as-Encoder) 和解释器 (LLM-as-Explainer) 来预测学生毕业后可能从事的数字岗位. 在构建的数据集上开展的大量实验表明, 所提方法在准确率和Macro-F1上较对比方法分别提升了7.71%和9.19%. 从开源参与角度为高校就业指导提供了数据驱动的决策支持, 帮助企业精准识别技术人才, 并为学生的职业规划提供量化参考.

关键词: 开源社区行为, 学生就业预测, 异构信息网络, 图神经网络, 大语言模型

Abstract:

Accurately predicting students’ post-graduation career paths plays a vital role in talent development in higher education and in refining recruitment strategies in industry. Most existing employment prediction research relies heavily on academic or campus-related data, while overlooking the role of students’ open-source contributions in the process of securing digital-related positions. This study addresses employment prediction for digital roles by analyzing students’ behaviors in open-source communities. We construct a heterogeneous graph comprising student nodes, code repository nodes, and various semantic relationships to model students’ expertise. To enhance prediction performance, we propose two strategies that integrate large language model (LLM) with graph neural networks: LLM-as-Encoder and LLM-as-Explainer. Experiments on our curated dataset show that the proposed approach outperforms baseline methods, achieving improvements of 7.71% in accuracy and 9.19% in Macro-F1. By leveraging open-source activity, this study supports data-driven decision-making for university career services, aids enterprises in identifying technical talent, and provides students with actionable insights for career planning.

Key words: open-source community behavior, student employment prediction, heterogeneous information network, graph neural network, large language model

中图分类号: