华东师范大学学报(自然科学版) ›› 2024, Vol. 2024 ›› Issue (5): 93-103.doi: 10.3969/j.issn.1000-5641.2024.05.009

• 教育知识图谱与大语言模型 • 上一篇    下一篇

基于开源代码大语言模型提示的学生代码修复

陈郅睿, 陆雪松*()   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2024-07-09 接受日期:2024-08-01 出版日期:2024-09-25 发布日期:2024-09-23
  • 通讯作者: 陆雪松 E-mail:xslu@dase.ecnu.edu.cn
  • 基金资助:
    国家自然科学基金 (62277017)

Prompting open-source code large language models for student program repair

Zhirui CHEN, Xuesong LU*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2024-07-09 Accepted:2024-08-01 Online:2024-09-25 Published:2024-09-23
  • Contact: Xuesong LU E-mail:xslu@dase.ecnu.edu.cn

摘要:

随着机器学习技术的进步, 旨在学习人类修复错误代码模式的自动程序修复技术可以辅助学生修复错误代码, 提高学生的自主学习效率. 在过去, 自动程序修复模型或是基于人工设计的符号规则, 或是基于数据驱动的方法. 随着具有强大自然语言理解能力和代码生成能力的大语言模型的出现, 一些研究尝试使用提示工程进行自动程序修复. 然而, 现有研究主要评估诸如Codex和GPT-4这样的商用模型, 一方面大规模使用的成本较高, 另一方面在教育场景下存在数据隐私隐患. 此外, 这些研究大多使用简单的提示形式来评估模型修复程序的能力, 且缺乏对结果的深入分析. 为弥补上述工作的不足, 通过提示工程评估了两个代表性的开源代码大语言模型, 测试了不同的提示方法, 例如思维链和少样本学习, 并对结果进行了深入分析, 最后提出了一些将大语言模型和编程教育场景结合的建议.

关键词: 自动程序修复, 大语言模型, 提示工程

Abstract:

Advancements in machine-learning technology has enabled automated program-repair techniques that learn human patterns of erroneous-code fixing, thereby assisting students in debugging and enhancing their self-directed learning efficiency. Automatic program-repair models are typically based on either manually designed symbolic rules or data-driven methods. Owing the availability of large language models that possess excellent natural-language understanding and code-generation capabilities, researchers have attempted to use prompt engineering for automatic program repair. However, existing studies primarily evaluate commercial models such as Codex and GPT-4, which may incur high costs for large-scale adoption and cause data-privacy issues in educational scenarios. Furthermore, these studies typically employ simple prompt forms to assess the program-repair capabilities of large language models, whereas the results are not analyzed comprehensively. Hence, we evaluate two representative open-source code large language models with excellent code-generation capability using prompt engineering. We evaluate different prompting methods, such as chain-of-thought and few-shot learning, and analyze the results comprehensively. Finally, we provide suggestions for integrating large language models into programming educational scenarios.

Key words: automatic program repair, large language models, prompt engineering

中图分类号: