Prompting open-source code large language models for student program repair

Zhirui CHEN; Xuesong LU

doi:10.3969/j.issn.1000-5641.2024.05.009

Journal of East China Normal University(Natural Science) >

2024 , Vol. 2024 >Issue 5: 93 - 103

DOI: https://doi.org/10.3969/j.issn.1000-5641.2024.05.009

Educational Knowledge Graphs and Large Language Models

Prompting open-source code large language models for student program repair

Zhirui CHEN ,
Xuesong LU

Expand

School of Data Science and Engineering, East China Normal University, Shanghai　200062, China

Received date: 2024-07-09

Accepted date: 2024-08-01

Online published: 2024-09-23

Fold

Abstract

Advancements in machine-learning technology has enabled automated program-repair techniques that learn human patterns of erroneous-code fixing, thereby assisting students in debugging and enhancing their self-directed learning efficiency. Automatic program-repair models are typically based on either manually designed symbolic rules or data-driven methods. Owing the availability of large language models that possess excellent natural-language understanding and code-generation capabilities, researchers have attempted to use prompt engineering for automatic program repair. However, existing studies primarily evaluate commercial models such as Codex and GPT-4, which may incur high costs for large-scale adoption and cause data-privacy issues in educational scenarios. Furthermore, these studies typically employ simple prompt forms to assess the program-repair capabilities of large language models, whereas the results are not analyzed comprehensively. Hence, we evaluate two representative open-source code large language models with excellent code-generation capability using prompt engineering. We evaluate different prompting methods, such as chain-of-thought and few-shot learning, and analyze the results comprehensively. Finally, we provide suggestions for integrating large language models into programming educational scenarios.

Key words： automatic program repair; large language models; prompt engineering

Cite this article

Zhirui CHEN , Xuesong LU . Prompting open-source code large language models for student program repair[J]. Journal of East China Normal University(Natural Science), 2024 , 2024(5) : 93 -103 . DOI: 10.3969/j.issn.1000-5641.2024.05.009

References

1	SUN Q, CHEN Z, XU F, et al. A survey of neural code intelligence: Paradigms, advances and beyond [EB/OL]. (2024-03-21)[2024-07-30]. https://doi.org/10.48550/arXiv.2403.14734.
2	GUPTA R, PAL S, KANADE A, et al. Deepfix: Fixing common C language errors by deep learning [C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2017: 1345-1351.
3	GULWANI S, RADI?EK I, ZULEGER F.. Automated clustering and program repair for introductory programming assignments. ACM SIGPLAN Notices, 2018, 53 (4): 465- 480.
4	WANG K, SINGH R, SU Z. Search, align, and repair: Data-driven feedback generation for introductory programming exercises [C]// Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 2018: 481-495.
5	AHMED U Z, KUMAR P, KARKARE A, et al. Compilation error repair: For the student programs, from the student programs [C]// Proceedings of the 40th International Conference on Software Engineering: Software Engineering Education and Training. 2018: 78-87.
6	BHATIA S, KOHLI P, SINGH R. Neuro-symbolic program corrector for introductory programming assignments [C]// Proceedings of the 40th International Conference on Software Engineering. 2018: 60-70.
7	HAN S, WANG Y, LU X. Errorclr: Semantic error classification, localization and repair for introductory programming assignments [C]// Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2023: 1345-1354.
8	HU Y, AHMED U Z, MECHTAEV S, et al. Re-factoring based program repair applied to programming assignments [C]// 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019: 388-398.
9	VASIC M, KANADE A, MANIATIS P, et al. Neural program repair by jointly learning to localize and repair [EB/OL]. (2019-04-03)[2024-07-30]. https://doi.org/10.48550/arXiv.1904.01720.
10	YASUNAGA M, LIANG P. Break-it-fix-it: Unsupervised learning for program repair [C]// International Conference on Machine Learning. PMLR, 2021: 11941-11952.
11	BERABI B, HE J, RAYCHEV V, et al. Tfix: Learning to fix coding errors with a text-to-text transformer [C]// International Conference on Machine Learning. PMLR, 2021: 780-791.
12	LI Y, WANG S, NGUYEN T N. Dlfix: Context-based code transformation learning for automated program repair [C]// Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 2020: 602-614.
13	LUTELLIER T, PHAM H V, PANG L, et al. Coconut: Combining context-aware neural translation models using ensemble for program repair [C]// Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2020: 101-114.
14	JOSHI H, SANCHEZ J C, GULWANI S, et al. Repair is nearly generation: Multilingual program repair with LLMs [EB/OL]. (2022-08-24)[2024-07-30]. https://doi.org/10.48550/arXiv.2208.11640.
15	ZHANG J, CAMBRONERO J, GULWANI S, et al. Repairing bugs in python assignments using large language models [EB/OL]. (2022-09-29)[2024-07-30]. https://doi.org/10.48550/arXiv.2209.14876.
16	JIANG N, LIU K, LUTELLIER T, et al. Impact of code language models on automated program repair [C]// 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023: 1430-1442.
17	FAN Z, GAO X, MIRCHEV M, et al. Automated repair of programs from large language models [C]// 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023: 1469-1481.
18	SHIRAFUJI A, RAHMAN M M, AMIN M F I, et al. Program repair with minimal edits using codet5 [C]// 2023 12th International Conference on Awareness Science and Technology (iCAST). IEEE, 2023: 178-184.
19	PHUNG T, P?DUREAN V A, CAMBRONERO J, et al. Generative AI for programming education: Benchmarking ChatGPT, GPT-4, and human tutors [C]// Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 2. 2023: 41-42.
20	ACHIAM J, ADLER S, AGARWAL S, et al. GPT-4 technical report [EB/OL]. (2023-03-15)[2024-07-30]. https://doi.org/10.48550/arXiv.2303.08774.
21	TOUVRON H, LAVRIL T, IZACARD G, et al. Llama: Open and efficient foundation language models [EB/OL]. (2023-02-27)[2024-07-30]. https://doi.org/10.48550/arXiv.2302.13971.
22	TOUVRON H, MARTIN L, STONE K, et al. Llama 2: Open foundation and fine-tuned chat models [EB/OL]. (2023-07-18)[2024-07-30]. https://doi.org/10.48550/arXiv.2307.09288.
23	YANG A, XIAO B, WANG B, et al. Baichuan 2: Open large-scale language models [EB/OL]. (2023-09-19)[2024-07-30]. https://doi.org/10.48550/arXiv.2309.10305.
24	BAI J, BAI S, CHU Y, et al. Qwen technical report [EB/OL]. (2023-09-28)[2024-07-30]. https://doi.org/10.48550/arXiv.2309.16609.
25	CHEN M, TWOREK J, JUN H, et al. Evaluating large language models trained on code [EB/OL]. (2021-07-07)[2024-07-30]. https://doi.org/10.48550/arXiv.2107.03374.
26	GUO D, ZHU Q, YANG D, et al. DeepSeek-Coder: When the large language model meets programming - The rise of code intelligence [EB/OL]. (2024-01-25)[2024-07-30]. https://doi.org/10.48550/arXiv.2401.14196.
27	ZHENG T, ZHANG G, SHEN T, et al. OpenCodeInterpreter: Integrating code generation with execution and refinement [EB/OL]. (2024-02-22)[2024-07-30]. https://doi.org/10.48550/arXiv.2402.14658.
28	NIJKAMP E, PANG B, HAYASHI H, et al. Codegen: An open large language model for code with multi-turn program synthesis [EB/OL]. (2022-03-25)[2024-07-30]. https://doi.org/10.48550/arXiv.2203.13474.
29	WANG Y, LE H, GOTMARE A D, et al. Codet5 +: Open code large language models for code understanding and generation [EB/OL]. (2023-05-13)[2024-07-30]. https://doi.org/10.48550/arXiv.2305.07922.
30	ROZIERE B, GEHRING J, GLOECKLE F, et al. Code Llama: Open foundation models for code [EB/OL]. (2023-08-24)[2024-07-30]. https://doi.org/10.48550/arXiv.2308.12950.
31	LUO Z, XU C, ZHAO P, et al. Wizardcoder: Empowering code large language models with evol-instruct [EB/OL]. (2023-06-14)[2024-07-30]. https://doi.org/10.48550/arXiv.2306.08568.
32	LI R, ALLAL L B, ZI Y, et al. Starcoder: May the source be with you! [EB/OL]. (2023-05-09)[2024-07-30]. https://doi.org/10.48550/arXiv.2305.06161.
33	WEI J, WANG X, SCHUURMANS D, et al.. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 2022, 35, 24824- 24837.
34	BROWN T, MANN B, RYDER N, et al.. Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020, 33, 1877- 1901.
35	WEI J, TAY Y, BOMMASANI R, et al. Emergent abilities of large language models [EB/OL]. (2022-06-15)[2024-07-30]. https://doi.org/10.48550/arXiv.2206.07682.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References