华东师范大学学报(自然科学版) ›› 2022, Vol. 2022 ›› Issue (5): 115-125.doi: 10.3969/j.issn.1000-5641.2022.05.010

• 供应链知识图谱构建与分析 • 上一篇    下一篇

基于隐层相关联算子的知识蒸馏方法

吴豪杰1, 王妍洁2, 蔡文炳2, 王飞3, 刘洋4, 蒲鹏5, 林绍辉4,*()   

  1. 1. 中国电子科技集团公司第二十七研究所, 郑州 450047
    2. 北京跟踪与通信技术研究所, 北京 100094
    3. 中国人民解放军63726部队, 银川 750004
    4. 华东师范大学 计算机科学与技术学院, 上海 200062
    5. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2022-07-08 出版日期:2022-09-25 发布日期:2022-09-26
  • 通讯作者: 林绍辉 E-mail:shlin@cs.ecnu.edu.cn
  • 基金资助:
    国家自然科学基金(62102151); 上海市杨帆计划项目(21YF1411200)

Correlation operation based on intermediate layers for knowledge method

Haojie WU1, Yanjie WANG2, Wenbing CAI2, Fei WANG3, Yang LIU4, Peng PU5, Shaohui LIN4,*()   

  1. 1. The 27th Research Institute of China Electronics Technology Group Corporation, Zhengzhou 450047, China
    2. Beijing Institute of Tracking and Telecommunication Technology, Beijing 100094, China
    3. Unit 63726 of the Chinese People’s Liberation Army, Yinchuan 750004, China
    4. School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
    5. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2022-07-08 Online:2022-09-25 Published:2022-09-26
  • Contact: Shaohui LIN E-mail:shlin@cs.ecnu.edu.cn

摘要:

近年来, 卷积神经网络已经在人工智能领域取得了巨大成功, 例如, 区块链、语音识别、图像理解等. 然而, 随着模型准确率的不断提高, 与之伴随的是网络模型计算量与参数量的大幅增长, 从而带来了诸如处理速度小, 内存占用大, 在移动端设备上难以部署等一系列问题. 知识蒸馏作为一种主流的模型压缩方法, 将教师网络的知识迁移到学生网络中, 从而在不增加参数量的情况下优化学生网络的表现. 如何挖掘具有代表性的知识表征进行蒸馏成为了知识蒸馏领域研究的核心问题. 本文提出了一种新的基于模型隐含层相关联算子的知识蒸馏方法, 借助数据增强方法准确捕获了图像特征在网络中间层每个阶段的学习变化过程, 利用相关联算子对该学习过程进行建模, 从而在教师网络中提取出一种新的表征信息用于指导学生网络训练. 实验结果表明, 本文所提出的方法在CIFAR-10、CIFAR-100两种数据集上, 相较于目前最优方法均取得了更好的性能.

关键词: 卷积神经网络, 模型压缩, 知识蒸馏, 知识表征, 相关联算子

Abstract:

Convolutional neural networks have made remarkable achievements in artificial intelligence, such as blockchain, speech recognition, and image understanding. However, improvement in model performance is accompanied by a substantial increase in the computational and parameter overhead, leading to a series of problems, such as a slow inference speed, large memory consumption, and difficulty of deployment on mobile devices. Knowledge distillation serves as a typical model compression method, and can transfer knowledge from the teacher network to the student network to improve the latter’s performance without any increase in the number of parameters. A method for extracting representative knowledge for distillation has become the core issue in this field. In this paper, we present a new knowledge distillation method based on intermediate correlation operation, which with the help of data augmentation captures the learning and transformation process of image features during each middle layer stage of the network. We model this feature transform procedure using a correlation operation to extract a new representation from the teacher network to guide the training of the student network. The experimental results demonstrate that our method achieves the best performance on both the CIFAR-10 and CIFAR-100 datasets, in comparison to previous state-of-the-art methods.

Key words: convolutional neural networks, model compression, knowledge distillation, knowledge representation, correlation operation

中图分类号: