华东师范大学学报(自然科学版) ›› 2023, Vol. 2023 ›› Issue (5): 100-109.doi: 10.3969/j.issn.1000-5641.2023.05.009

• 数据学习系统 • 上一篇    

基于UI图像的Web前端代码自动生成

葛进, 陆雪松*()   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2023-06-30 接受日期:2023-07-22 出版日期:2023-09-25 发布日期:2023-09-15
  • 通讯作者: 陆雪松 E-mail:xslu@dase.ecnu.edu.cn
  • 基金资助:
    国家自然科学基金 (62277017)

Automatic generation of Web front-end code based on UI images

Jin GE, Xuesong LU*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2023-06-30 Accepted:2023-07-22 Online:2023-09-25 Published:2023-09-15
  • Contact: Xuesong LU E-mail:xslu@dase.ecnu.edu.cn

摘要:

用户界面 (user interface, UI) 在应用程序与用户的交互中扮演了至关重要的角色. 当前移动互联网的普及, 已经使得基于Web (world wide web)的应用大规模从桌面端迁移到移动端, Web前端开发在应用程序的开发中愈加广泛和深入. 传统Web前端开发首先依赖设计人员给出设计稿, 然后由程序员编写相应的UI代码. 这种方式行业壁垒高、开发速度慢, 不利于产品的快速迭代. 深度学习的发展使得基于UI图像自动生成Web前端代码成为可能. 现有方法对于UI图像特征的捕捉能力较弱, 生成代码的准确性较低. 为了改善这些问题, 提出了基于Swin Transformer的Encoder-Decoder模型image2code, 用于从UI图像生成Web前端代码. image2code将从UI图像生成Web前端代码的过程视为图像描述任务的一种形式, 将包含滑动窗口设计的Swin Transformer作为模型编码器和解码器的骨干网络. 其中滑窗操作将注意力计算限制在一个窗口内, 减少了注意力机制的计算量, 同时保证了不同窗口间仍然有特征关联. 另外, image2code生成可以直接转换为HTML (hyper text markup language)代码的Emmet代码, 利用Emmet代码的简洁性提高模型训练的效率. 实验结果表明, 在已有公开数据集和新构建的数据集上, image2code在Web前端代码生成任务上的表现要优于pix2code和image2emmet等代表性模型.

关键词: UI图像, Web前端代码生成, 注意力机制, 智能Web开发

Abstract:

User interfaces (UIs) play a vital role in the interactions between an application and its users. The current popularity of mobile Internet has led to the large-scale migration of web-based applications from desktop to mobile. Web front-end development has become more extensive and in-depth in application development. Traditional web front-end development relies on designers to give initial design drafts and then programmers to write the corresponding UI code. This method has high industry barriers and slow development, which are not conducive to rapid product iteration. The development of deep learning makes it possible to automatically generate web front-end code based on UI images. Existing methods poorly capture the features of UI images, and the accuracy of the generated code is low. To mitigate these problems, we propose an encoder–decoder model, called image2code, based on the Swin Transformer, which is used to generate web front-end code from UI images. Image2code regards the process of generating web front-end code from UI images as an image captioning task and uses Swin Transformer with a sliding window design as the backbone network of the encoder and decoder. The sliding window operation limits the attention calculation to one window, which reduces the amount of calculation by the attention mechanism while simultaneously ensuring that feature connections remain across windows. In addition, image2code generates Emmet code, which is much simpler and can be directly converted to HTML code, improving the efficiency of model training. Experimental results show that image2code performs better than existing representative models, such as pix2code and image2emmet, in the task of web front-end code generation on existing and newly constructed datasets.

Key words: UI images, Web front-end code generation, attention mechanism, intelligent Web development

中图分类号: