华东师范大学学报(自然科学版) ›› 2022, Vol. 2022 ›› Issue (4): 67-78.doi: 10.3969/j.issn.1000-5641.2022.04.007

• 计算机科学 • 上一篇    下一篇

基于Cortex-M4内核的AES-128-CTR算法汇编优化

杨东轩1(), 张刚刚2,*(), 刘新亮1   

  1. 1. 北京工商大学 电商与物流学院, 北京 100048
    2. 首都师范大学 数字校园建设中心, 北京 100048
  • 收稿日期:2021-03-23 出版日期:2022-07-25 发布日期:2022-07-19
  • 通讯作者: 张刚刚 E-mail:yangdongxuan@btbu.edu.cn;zgg@cnu.edu.cn
  • 基金资助:
    国家重点研发计划子课题(2016YFD0401205); 北京市自然科学基金(4202014); 北京市科学技术委员会计划项目(Z191100008619007)

Assembly optimization of an AES-128-CTR algorithm based on a Cortex-M4 core

Dongxuan YANG1(), Ganggang ZHANG2,*(), Xinliang LIU1   

  1. 1. School of E-commerce and Logistics, Beijing Technology and Business University, Beijing 100048, China
    2. Digital Campus, Capital Normal University, Beijing 100048, China
  • Received:2021-03-23 Online:2022-07-25 Published:2022-07-19
  • Contact: Ganggang ZHANG E-mail:yangdongxuan@btbu.edu.cn;zgg@cnu.edu.cn

摘要:

随着物联网的快速发展, 嵌入式硬件产品在保障数据安全方面面临极大挑战. AES (Advanced Encryption Standard) 算法在数据加解密领域具有抗攻击性强、运算速度大以及分组长度灵活等优点. 由于嵌入式微控制器不具有针对AES加密的扩展指令集, 因此该算法的运行速度在微控制器平台上的表现远不如通用CPU (Central Processing Unit). 为了解决这个问题, 在基于Cortex-M4内核指令集的微控制器平台上, 使用汇编语言提高了AES算法在CTR (Counter)模式下的运行速度. 结合该内核特有的桶形移位器和三级流水线等特点优化算法的轮变换, 缩减算法运行时所需的指令周期数. 在FRDM-K82F开发板上的测试表明, 该优化算法的运行效率高于C语言实现代码的运行效率, 同时比基于协处理器所实现的硬件AES加密在成本和功耗方面更具有优势.

关键词: 汇编优化, AES, Cortex-M4

Abstract:

With the rapid development of the Internet of Things, embedded hardware products face great challenges in data security. The AES (Advanced Encryption Standard) algorithm has the advantages of strong attack resistance, fast operation speed and flexible block length in the field of data encryption and decryption. The speed of this algorithm on microcontroller platforms is far inferior to general-purpose CPUs (Central Processing Units) which have an extended instruction set for AES encryption. To solve this problem, a speed optimized AES algorithm in CTR (Counter) mode based on the Cortex-M4 core instruction set is implemented using assembly language. The kernel’s unique barrel shifter and three-stage pipeline are used to optimize the round transformation of the algorithm, and the number of instruction cycles is reduced. Testing on an FRDM-K82F development board shows that the assembly optimization of the algorithm is substantially more efficient than the code implemented using the C language, and it offers more advantages in both cost and power consumption compared to hardware encryption based on the coprocessor.

Key words: assembly optimization, AES, Cortex-M4

中图分类号: