Automatic generation of Web front-end code based on UI images

Jin GE; Xuesong LU

doi:10.3969/j.issn.1000-5641.2023.05.009

Journal of East China Normal University(Natural Science) >

2023 , Vol. 2023 >Issue 5: 100 - 109

DOI: https://doi.org/10.3969/j.issn.1000-5641.2023.05.009

System for Learning from Data

Automatic generation of Web front-end code based on UI images

Jin GE ,
Xuesong LU

Expand

School of Data Science and Engineering, East China Normal University, Shanghai　200062, China

Received date: 2023-06-30

Accepted date: 2023-07-22

Online published: 2023-09-20

Fold

Abstract

User interfaces (UIs) play a vital role in the interactions between an application and its users. The current popularity of mobile Internet has led to the large-scale migration of web-based applications from desktop to mobile. Web front-end development has become more extensive and in-depth in application development. Traditional web front-end development relies on designers to give initial design drafts and then programmers to write the corresponding UI code. This method has high industry barriers and slow development, which are not conducive to rapid product iteration. The development of deep learning makes it possible to automatically generate web front-end code based on UI images. Existing methods poorly capture the features of UI images, and the accuracy of the generated code is low. To mitigate these problems, we propose an encoder–decoder model, called image2code, based on the Swin Transformer, which is used to generate web front-end code from UI images. Image2code regards the process of generating web front-end code from UI images as an image captioning task and uses Swin Transformer with a sliding window design as the backbone network of the encoder and decoder. The sliding window operation limits the attention calculation to one window, which reduces the amount of calculation by the attention mechanism while simultaneously ensuring that feature connections remain across windows. In addition, image2code generates Emmet code, which is much simpler and can be directly converted to HTML code, improving the efficiency of model training. Experimental results show that image2code performs better than existing representative models, such as pix2code and image2emmet, in the task of web front-end code generation on existing and newly constructed datasets.

Key words： UI images; Web front-end code generation; attention mechanism; intelligent Web development

Cite this article

Jin GE , Xuesong LU . Automatic generation of Web front-end code based on UI images[J]. Journal of East China Normal University(Natural Science), 2023 , 2023(5) : 100 -109 . DOI: 10.3969/j.issn.1000-5641.2023.05.009

References

1	BAULé D D S, WANGENHEIM C, WANGENHEIM A V, et al.. Recent progress in automated code generation from GUI images using machine learning techniques. Journal of Universal Computer Science, 2020, 26 (9): 1095.
2	BELTRAMELLI T. Pix2code: Generating code from a graphical user interface screenshot [C]// Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems. ACM, 2018.
3	O’SHEA K, NASH R. An Introduction to Convolutional Neural Networks [EB/OL]. (2015-11-26)[2023-06-30]. https://arxiv.org/pdf/1511.08458.pdf.
4	YU Y, SI X, HU C, et al.. A review of recurrent neural networks: LSTM cells and network architectures. Neural Computation, 2019, 31 (7): 1235- 1270.
5	ZHU Z, XUE Z, YUAN Z. Automatic Graphics Program Generation Using Attention-Based Hierarchical Decoder [M]// JAWAHAR C, LI H, MORI G, et al. Computer Vision – ACCV 2018. Cham: Springer, 2019: 181-196.
6	XU Y, BO L, SUN X, et al.. image2emmet: Automatic code generation from web user interface image. Journal of Software: Evolution and Process, 2021, 33 (8): e2369.
7	CHEN W Y, PODSTRELENY P, CHENG W H, et al.. Code generation from a graphical user interface via attention-based encoder–decoder model. Multimedia Systems, 2021, (5): 1- 10.
8	XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention [J]. (2015-02-10)[2023-06-30]. https://arxiv.org/pdf/1502.03044.pdf.
9	HOSSAIN M Z, SOHEL F, SHIRATUDDIN M F, et al.. A comprehensive survey of deep learning for image captioning. ACM Computing Surveys, 2019, 51 (6): 1- 36.
10	XIAN T, LI Z, ZHANG C, et al.. Dual Global Enhanced Transformer for image captioning. Neural Networks, 2022, (148): 129- 141.
11	TAN J H, TAN Y H, CHAN C S, et al.. ACORT: A compact object relation transformer for parameter efficient image captioning. Neurocomputing, 2022, 482, 60- 72.
12	LIU Z, LIN Y, CAO Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 10012-10022.
13	BAJAMMAL M, MAZINANIAN D, MESBAH A. Generating reusable web components from mockups [C]// Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 2018: 601-611.
14	ANDERSON P, HE X, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6077-6086.
15	WEI Y, WU C, LI G, et al.. Sequential transformer via an outside-in attention for image captioning. Engineering Applications of Artificial Intelligence, 2022, 108, 104574.
16	BEN H, PAN Y, LI Y, et al.. Unpaired image captioning with semantic-constrained self-learning. IEEE Transactions on Multimedia, 2021, 24, 904- 916.
17	WANG Y, XU J, SUN Y. End-to-end transformer based model for image captioning [C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2022: 2585-2594.
18	LUO J, LI Y, PAN Y, et al. Semantic-conditional diffusion networks for image captioning [C]// 2023 IEEE Conference on Computer Vision and Pattern Recognition. 2023: 23359-23368.
19	RAMOS R, MARTINS B, ELLIOTT D, et al. Smallcap: Lightweight image captioning prompted with retrieval augmentation [C]// 2023 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2023: 2840-2849.
20	WEI J, LI Z, ZHU J, et al.. Enhance understanding and reasoning ability for image captioning. Applied Intelligence, 2023, 53 (3): 2706- 2722.
21	GUNDECHA U. Learning Selenium Testing Tools with Python [M]. Birmingham: Packt Publishing, 2014.
22	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000–6010.
23	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. (2020-10-22)[2023-06-30]. https://arxiv.org/pdf/2010.11929.pdf.
24	PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation [C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2022: 311-318.
25	SATANJEEV B. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments [C]// Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, 2005: 65-72.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References