基于UI图像的Web前端代码自动生成

葛进; 陆雪松

doi:10.3969/j.issn.1000-5641.2023.05.009

华东师范大学学报（自然科学版） >

2023 , Vol. 2023 >Issue 5: 100 - 109

DOI: https://doi.org/10.3969/j.issn.1000-5641.2023.05.009

数据学习系统

基于UI图像的Web前端代码自动生成

葛进 ,
陆雪松

展开

华东师范大学数据科学与工程学院, 上海　200062

收稿日期: 2023-06-30

录用日期: 2023-07-22

网络出版日期: 2023-09-20

基金资助

国家自然科学基金 (62277017)

收起

Automatic generation of Web front-end code based on UI images

Jin GE ,
Xuesong LU

Expand

School of Data Science and Engineering, East China Normal University, Shanghai　200062, China

Received date: 2023-06-30

Accepted date: 2023-07-22

Online published: 2023-09-20

Fold

摘要

用户界面 (user interface, UI) 在应用程序与用户的交互中扮演了至关重要的角色. 当前移动互联网的普及, 已经使得基于Web (world wide web)的应用大规模从桌面端迁移到移动端, Web前端开发在应用程序的开发中愈加广泛和深入. 传统Web前端开发首先依赖设计人员给出设计稿, 然后由程序员编写相应的UI代码. 这种方式行业壁垒高、开发速度慢, 不利于产品的快速迭代. 深度学习的发展使得基于UI图像自动生成Web前端代码成为可能. 现有方法对于UI图像特征的捕捉能力较弱, 生成代码的准确性较低. 为了改善这些问题, 提出了基于Swin Transformer的Encoder-Decoder模型image2code, 用于从UI图像生成Web前端代码. image2code将从UI图像生成Web前端代码的过程视为图像描述任务的一种形式, 将包含滑动窗口设计的Swin Transformer作为模型编码器和解码器的骨干网络. 其中滑窗操作将注意力计算限制在一个窗口内, 减少了注意力机制的计算量, 同时保证了不同窗口间仍然有特征关联. 另外, image2code生成可以直接转换为HTML (hyper text markup language)代码的Emmet代码, 利用Emmet代码的简洁性提高模型训练的效率. 实验结果表明, 在已有公开数据集和新构建的数据集上, image2code在Web前端代码生成任务上的表现要优于pix2code和image2emmet等代表性模型.

关键词： UI图像; Web前端代码生成; 注意力机制; 智能Web开发

本文引用格式

葛进 , 陆雪松 . 基于UI图像的Web前端代码自动生成[J]. 华东师范大学学报（自然科学版）, 2023 , 2023(5) : 100 -109 . DOI: 10.3969/j.issn.1000-5641.2023.05.009

Abstract

User interfaces (UIs) play a vital role in the interactions between an application and its users. The current popularity of mobile Internet has led to the large-scale migration of web-based applications from desktop to mobile. Web front-end development has become more extensive and in-depth in application development. Traditional web front-end development relies on designers to give initial design drafts and then programmers to write the corresponding UI code. This method has high industry barriers and slow development, which are not conducive to rapid product iteration. The development of deep learning makes it possible to automatically generate web front-end code based on UI images. Existing methods poorly capture the features of UI images, and the accuracy of the generated code is low. To mitigate these problems, we propose an encoder–decoder model, called image2code, based on the Swin Transformer, which is used to generate web front-end code from UI images. Image2code regards the process of generating web front-end code from UI images as an image captioning task and uses Swin Transformer with a sliding window design as the backbone network of the encoder and decoder. The sliding window operation limits the attention calculation to one window, which reduces the amount of calculation by the attention mechanism while simultaneously ensuring that feature connections remain across windows. In addition, image2code generates Emmet code, which is much simpler and can be directly converted to HTML code, improving the efficiency of model training. Experimental results show that image2code performs better than existing representative models, such as pix2code and image2emmet, in the task of web front-end code generation on existing and newly constructed datasets.

Key words： UI images; Web front-end code generation; attention mechanism; intelligent Web development

参考文献

1	BAULé D D S, WANGENHEIM C, WANGENHEIM A V, et al.. Recent progress in automated code generation from GUI images using machine learning techniques. Journal of Universal Computer Science, 2020, 26 (9): 1095.
2	BELTRAMELLI T. Pix2code: Generating code from a graphical user interface screenshot [C]// Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems. ACM, 2018.
3	O’SHEA K, NASH R. An Introduction to Convolutional Neural Networks [EB/OL]. (2015-11-26)[2023-06-30]. https://arxiv.org/pdf/1511.08458.pdf.
4	YU Y, SI X, HU C, et al.. A review of recurrent neural networks: LSTM cells and network architectures. Neural Computation, 2019, 31 (7): 1235- 1270.
5	ZHU Z, XUE Z, YUAN Z. Automatic Graphics Program Generation Using Attention-Based Hierarchical Decoder [M]// JAWAHAR C, LI H, MORI G, et al. Computer Vision – ACCV 2018. Cham: Springer, 2019: 181-196.
6	XU Y, BO L, SUN X, et al.. image2emmet: Automatic code generation from web user interface image. Journal of Software: Evolution and Process, 2021, 33 (8): e2369.
7	CHEN W Y, PODSTRELENY P, CHENG W H, et al.. Code generation from a graphical user interface via attention-based encoder–decoder model. Multimedia Systems, 2021, (5): 1- 10.
8	XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention [J]. (2015-02-10)[2023-06-30]. https://arxiv.org/pdf/1502.03044.pdf.
9	HOSSAIN M Z, SOHEL F, SHIRATUDDIN M F, et al.. A comprehensive survey of deep learning for image captioning. ACM Computing Surveys, 2019, 51 (6): 1- 36.
10	XIAN T, LI Z, ZHANG C, et al.. Dual Global Enhanced Transformer for image captioning. Neural Networks, 2022, (148): 129- 141.
11	TAN J H, TAN Y H, CHAN C S, et al.. ACORT: A compact object relation transformer for parameter efficient image captioning. Neurocomputing, 2022, 482, 60- 72.
12	LIU Z, LIN Y, CAO Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 10012-10022.
13	BAJAMMAL M, MAZINANIAN D, MESBAH A. Generating reusable web components from mockups [C]// Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 2018: 601-611.
14	ANDERSON P, HE X, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6077-6086.
15	WEI Y, WU C, LI G, et al.. Sequential transformer via an outside-in attention for image captioning. Engineering Applications of Artificial Intelligence, 2022, 108, 104574.
16	BEN H, PAN Y, LI Y, et al.. Unpaired image captioning with semantic-constrained self-learning. IEEE Transactions on Multimedia, 2021, 24, 904- 916.
17	WANG Y, XU J, SUN Y. End-to-end transformer based model for image captioning [C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2022: 2585-2594.
18	LUO J, LI Y, PAN Y, et al. Semantic-conditional diffusion networks for image captioning [C]// 2023 IEEE Conference on Computer Vision and Pattern Recognition. 2023: 23359-23368.
19	RAMOS R, MARTINS B, ELLIOTT D, et al. Smallcap: Lightweight image captioning prompted with retrieval augmentation [C]// 2023 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2023: 2840-2849.
20	WEI J, LI Z, ZHU J, et al.. Enhance understanding and reasoning ability for image captioning. Applied Intelligence, 2023, 53 (3): 2706- 2722.
21	GUNDECHA U. Learning Selenium Testing Tools with Python [M]. Birmingham: Packt Publishing, 2014.
22	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000–6010.
23	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. (2020-10-22)[2023-06-30]. https://arxiv.org/pdf/2010.11929.pdf.
24	PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation [C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2022: 311-318.
25	SATANJEEV B. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments [C]// Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, 2005: 65-72.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献