Journal of East China Normal University(Natural Science) >
Prompting open-source code large language models for student program repair
Received date: 2024-07-09
Accepted date: 2024-08-01
Online published: 2024-09-23
Advancements in machine-learning technology has enabled automated program-repair techniques that learn human patterns of erroneous-code fixing, thereby assisting students in debugging and enhancing their self-directed learning efficiency. Automatic program-repair models are typically based on either manually designed symbolic rules or data-driven methods. Owing the availability of large language models that possess excellent natural-language understanding and code-generation capabilities, researchers have attempted to use prompt engineering for automatic program repair. However, existing studies primarily evaluate commercial models such as Codex and GPT-4, which may incur high costs for large-scale adoption and cause data-privacy issues in educational scenarios. Furthermore, these studies typically employ simple prompt forms to assess the program-repair capabilities of large language models, whereas the results are not analyzed comprehensively. Hence, we evaluate two representative open-source code large language models with excellent code-generation capability using prompt engineering. We evaluate different prompting methods, such as chain-of-thought and few-shot learning, and analyze the results comprehensively. Finally, we provide suggestions for integrating large language models into programming educational scenarios.
Zhirui CHEN , Xuesong LU . Prompting open-source code large language models for student program repair[J]. Journal of East China Normal University(Natural Science), 2024 , 2024(5) : 93 -103 . DOI: 10.3969/j.issn.1000-5641.2024.05.009
1 | SUN Q, CHEN Z, XU F, et al. A survey of neural code intelligence: Paradigms, advances and beyond [EB/OL]. (2024-03-21)[2024-07-30]. https://doi.org/10.48550/arXiv.2403.14734. |
2 | GUPTA R, PAL S, KANADE A, et al. Deepfix: Fixing common C language errors by deep learning [C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2017: 1345-1351. |
3 | GULWANI S, RADI?EK I, ZULEGER F.. Automated clustering and program repair for introductory programming assignments. ACM SIGPLAN Notices, 2018, 53 (4): 465- 480. |
4 | WANG K, SINGH R, SU Z. Search, align, and repair: Data-driven feedback generation for introductory programming exercises [C]// Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 2018: 481-495. |
5 | AHMED U Z, KUMAR P, KARKARE A, et al. Compilation error repair: For the student programs, from the student programs [C]// Proceedings of the 40th International Conference on Software Engineering: Software Engineering Education and Training. 2018: 78-87. |
6 | BHATIA S, KOHLI P, SINGH R. Neuro-symbolic program corrector for introductory programming assignments [C]// Proceedings of the 40th International Conference on Software Engineering. 2018: 60-70. |
7 | HAN S, WANG Y, LU X. Errorclr: Semantic error classification, localization and repair for introductory programming assignments [C]// Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2023: 1345-1354. |
8 | HU Y, AHMED U Z, MECHTAEV S, et al. Re-factoring based program repair applied to programming assignments [C]// 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019: 388-398. |
9 | VASIC M, KANADE A, MANIATIS P, et al. Neural program repair by jointly learning to localize and repair [EB/OL]. (2019-04-03)[2024-07-30]. https://doi.org/10.48550/arXiv.1904.01720. |
10 | YASUNAGA M, LIANG P. Break-it-fix-it: Unsupervised learning for program repair [C]// International Conference on Machine Learning. PMLR, 2021: 11941-11952. |
11 | BERABI B, HE J, RAYCHEV V, et al. Tfix: Learning to fix coding errors with a text-to-text transformer [C]// International Conference on Machine Learning. PMLR, 2021: 780-791. |
12 | LI Y, WANG S, NGUYEN T N. Dlfix: Context-based code transformation learning for automated program repair [C]// Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 2020: 602-614. |
13 | LUTELLIER T, PHAM H V, PANG L, et al. Coconut: Combining context-aware neural translation models using ensemble for program repair [C]// Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2020: 101-114. |
14 | JOSHI H, SANCHEZ J C, GULWANI S, et al. Repair is nearly generation: Multilingual program repair with LLMs [EB/OL]. (2022-08-24)[2024-07-30]. https://doi.org/10.48550/arXiv.2208.11640. |
15 | ZHANG J, CAMBRONERO J, GULWANI S, et al. Repairing bugs in python assignments using large language models [EB/OL]. (2022-09-29)[2024-07-30]. https://doi.org/10.48550/arXiv.2209.14876. |
16 | JIANG N, LIU K, LUTELLIER T, et al. Impact of code language models on automated program repair [C]// 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023: 1430-1442. |
17 | FAN Z, GAO X, MIRCHEV M, et al. Automated repair of programs from large language models [C]// 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023: 1469-1481. |
18 | SHIRAFUJI A, RAHMAN M M, AMIN M F I, et al. Program repair with minimal edits using codet5 [C]// 2023 12th International Conference on Awareness Science and Technology (iCAST). IEEE, 2023: 178-184. |
19 | PHUNG T, P?DUREAN V A, CAMBRONERO J, et al. Generative AI for programming education: Benchmarking ChatGPT, GPT-4, and human tutors [C]// Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 2. 2023: 41-42. |
20 | ACHIAM J, ADLER S, AGARWAL S, et al. GPT-4 technical report [EB/OL]. (2023-03-15)[2024-07-30]. https://doi.org/10.48550/arXiv.2303.08774. |
21 | TOUVRON H, LAVRIL T, IZACARD G, et al. Llama: Open and efficient foundation language models [EB/OL]. (2023-02-27)[2024-07-30]. https://doi.org/10.48550/arXiv.2302.13971. |
22 | TOUVRON H, MARTIN L, STONE K, et al. Llama 2: Open foundation and fine-tuned chat models [EB/OL]. (2023-07-18)[2024-07-30]. https://doi.org/10.48550/arXiv.2307.09288. |
23 | YANG A, XIAO B, WANG B, et al. Baichuan 2: Open large-scale language models [EB/OL]. (2023-09-19)[2024-07-30]. https://doi.org/10.48550/arXiv.2309.10305. |
24 | BAI J, BAI S, CHU Y, et al. Qwen technical report [EB/OL]. (2023-09-28)[2024-07-30]. https://doi.org/10.48550/arXiv.2309.16609. |
25 | CHEN M, TWOREK J, JUN H, et al. Evaluating large language models trained on code [EB/OL]. (2021-07-07)[2024-07-30]. https://doi.org/10.48550/arXiv.2107.03374. |
26 | GUO D, ZHU Q, YANG D, et al. DeepSeek-Coder: When the large language model meets programming - The rise of code intelligence [EB/OL]. (2024-01-25)[2024-07-30]. https://doi.org/10.48550/arXiv.2401.14196. |
27 | ZHENG T, ZHANG G, SHEN T, et al. OpenCodeInterpreter: Integrating code generation with execution and refinement [EB/OL]. (2024-02-22)[2024-07-30]. https://doi.org/10.48550/arXiv.2402.14658. |
28 | NIJKAMP E, PANG B, HAYASHI H, et al. Codegen: An open large language model for code with multi-turn program synthesis [EB/OL]. (2022-03-25)[2024-07-30]. https://doi.org/10.48550/arXiv.2203.13474. |
29 | WANG Y, LE H, GOTMARE A D, et al. Codet5 +: Open code large language models for code understanding and generation [EB/OL]. (2023-05-13)[2024-07-30]. https://doi.org/10.48550/arXiv.2305.07922. |
30 | ROZIERE B, GEHRING J, GLOECKLE F, et al. Code Llama: Open foundation models for code [EB/OL]. (2023-08-24)[2024-07-30]. https://doi.org/10.48550/arXiv.2308.12950. |
31 | LUO Z, XU C, ZHAO P, et al. Wizardcoder: Empowering code large language models with evol-instruct [EB/OL]. (2023-06-14)[2024-07-30]. https://doi.org/10.48550/arXiv.2306.08568. |
32 | LI R, ALLAL L B, ZI Y, et al. Starcoder: May the source be with you! [EB/OL]. (2023-05-09)[2024-07-30]. https://doi.org/10.48550/arXiv.2305.06161. |
33 | WEI J, WANG X, SCHUURMANS D, et al.. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 2022, 35, 24824- 24837. |
34 | BROWN T, MANN B, RYDER N, et al.. Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020, 33, 1877- 1901. |
35 | WEI J, TAY Y, BOMMASANI R, et al. Emergent abilities of large language models [EB/OL]. (2022-06-15)[2024-07-30]. https://doi.org/10.48550/arXiv.2206.07682. |
/
〈 |
|
〉 |