深度神经网络模型压缩方法与进展

doi:10.3969/j.issn.1000-5641.202091001

摘要/Abstract

摘要： 深度神经网络(Deep Neural Network, DNN)模型通过巨大的内存消耗和高计算量来实现强大的性能, 难以部署在有限资源的硬件平台上. 通过模型压缩来降低内存成本和加速计算已成为热点问题, 近年来已有大量的这方面的研究工作. 主要介绍了4种具有代表性的深度神经网络压缩方法，即网络剪枝、量化、知识蒸馏和紧凑神经网络设计; 着重介绍了近年来具有代表性的压缩模型方法及其特点; 最后, 总结了模型压缩的相关评价标准和研究前景.

关键词: 深度神经网络压缩, 网络剪枝, 量化, 知识蒸馏, 紧凑神经网络

Abstract: The deep neural network (DNN) model achieves strong performance using substantial memory consumption and high computational power, which can be difficult to deploy on hardware platforms with limited resources. To meet these challenges, researchers have made great strides in this field and have formed a wealth of relevant literature and methods. This paper introduces four representative compression methods for deep neural networks used in recent years: network pruning, quantization, knowledge distillation, and compact network design; in particular, the article focuses on the characteristics of these representative models. Finally, evaluation criteria and research prospects of model compression are summarized.

Key words: deep neural network compression, network pruning, quantification, knowledge distillation, compact neural network

中图分类号:

TP391

赖叶静, 郝珊锋, 黄定江. 深度神经网络模型压缩方法与进展[J]. 华东师范大学学报（自然科学版）, 2020, 2020(5): 68-82.

LAI Yejing, HAO Shanfeng, HUANG Dingjiang. Methods and progress in deep neural network model compression[J]. Journal of East China Normal University(Natural Science), 2020, 2020(5): 68-82.

参考文献

[1] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge [J]. International journal of computer vision, 2015, 115(3): 211-252.
[2] HE Y, SAINATH T N, PRABHAVALKAR R, et al. Streaming end-to-end speech recognition for mobile devices [C]//ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019: 6381-6385.
[3] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2019-05-24)[2020-07-02]. https://arxiv.org/pdf/1810.04805.pdf.
[4] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2015-04-10)[2020-07-02]. https://arxiv.org/pdf/1409.1556.pdf.
[5] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 770-778. DOI: 10.1109/CVPR.2016.90.
[6] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2261-2269. DOI: 10.1109/CVPR.2017.243.
[7] CHENG Y, WANG D, ZHOU P, et al. A survey of model compression and acceleration for deep neural networks [EB/OL]. (2020-06-14)[2020-07-02]. https://arxiv.org/pdf/1710.09282.pdf.
[8] 雷杰, 高鑫, 宋杰, 等. 深度网络模型压缩综述 [J]. 软件学报, 2018, 29(2): 251-266
[9] CHOUDHARY T, MISHRA V, GOSWAMI A, et al. A comprehensive survey on model compression and acceleration [J/OL]. Artificial Intelligence Review, 2020. (2020-02-08)[2020-07-02]. https://doi.org/10.1007/s10462-020-09816-7.
[10] 李江昀, 赵义凯, 薛卓尔, 等. 深度神经网络模型压缩综述 [J]. 工程科学学报, 2019, 41(10): 1229-1239
[11] WANG R J, LI X, LING C X. Pelee: A real-time object detection system on mobile devices [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2018: 1967-1976.
[12] CHEN X L, GIRSHICK R, HE K M, et al. TensorMask: A foundation for dense object segmentation [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 2061-2069.
[13] SANH V, DEBUT L, CHAUMOND J, et al. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter [EB/OL]. (2020-01-24)[2020-07-01]. https://arxiv.org/pdf/1910.01108v3.pdf.
[14] QIN Z, LI Z, ZHANG Z, et al. ThunderNet: Towards real-time generic object detection on mobile devices [C]//Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2019: 6718-6727.
[15] ANWAR S, SUNG W. Coarse pruning of convolutional neural networks with random masks[EB/OL]. [2020-07-02]. https://openreview.net/pdf?id=HkvS3Mqxe.
[16] LECUN Y, DENKER J S, SOLLA S A. Optimal brain damage [C]//Advances in Neural Information Processing Systems. 1989: 598-605.
[17] HASSIBI B, STORK D G. Second order derivatives for network pruning: Optimal brain surgeon [C]//Advances in Neural Information Processing Systems. 1993: 164-171.
[18] ZHANG T, YE S, ZHANG K, et al. A systematic dnn weight pruning framework using alternating direction method of multipliers [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 184-199.
[19] MA X L, GUO F M, NIU W, et al. PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices [C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20). 2020: 5117-5124.
[20] HE Y, ZHANG X, SUN J. Channel pruning for accelerating very deep neural networks [C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 1389-1397.
[21] CHIN T W, DING R, ZHANG C, et al. Towards efficient model compression via learned global ranking [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1518-1528.
[22] MOLCHANOV P, MALLYA A, TYREE S, et al. Importance estimation for neural network pruning [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 11264-11272.
[23] LUO J H, WU J, LIN W. Thinet: A filter level pruning method for deep neural network compression [C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 5058-5066.
[24] ZHUANG Z W, TAN M K, ZHUANG B, et al. Discrimination-aware channel pruning for deep neural networks [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems(NIPS’18). New York: Curran Associates Inc., 2018: 883–894.
[25] HE Y, LIU P, WANG Z, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 4340-4349.
[26] LIN M, JI R, WANG Y, et al. HRank: Filter pruning using high-rank feature map [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1529-1538.
[27] LIN X, ZHAO C, PAN W. Towards accurate binary convolutional neural network [C]//Advances in Neural Information Processing Systems. 2017: 345-353.
[28] LIU Z, WU B, LUO W, et al. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 722-737.
[29] HUBARA I, COURBARIAUX M, SOUDRY D, et al. Binarized neural networks [C]//Advances in Neural Information Processing Systems. 2016: 4107-4115.
[30] LI F F, ZHANG B, LIU B. Ternary weight networks [EB/OL]. (2016-11-19)[2020-07-03]. https://arxiv.org/pdf/1605.04711.pdf.
[31] WANG P, CHENG J. Fixed-point factorized networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 4012-4020.
[32] BOROUMAND A, GHOSE S, KIM Y, et al. Google workloads for consumer devices: Mitigating data movement bottlenecks [C]//Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 2018: 316-331.
[33] HAN S, MAO H Z, DALLY W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding [EB/OL]. (2015-11-20)[2020-07-03]. https://arxiv.org/pdf/1510.00149v3.pdf.
[34] CHEN W, WILSON J, TYREE S, et al. Compressing neural networks with the hashing trick [C]// International Conference on Machine Learning. 2015: 2285-2294.
[35] STOCK P, JOULIN A, GRIBONVAL R, et al. And the bit goes down: Revisiting the quantizetion of neural networks [EB/OL]. (2019-12-20)[2020-07-02]. https://arxiv.org/pdf/1907.05686.pdf.
[36] CARREIRA-PERPINÁN M A, IDELBAYEV Y. Model compression as constrained optimization, with application to neural nets. Part Ⅱ: Quantization [EB/OL]. (2017-07-13)[2020-07-03]. https://arxiv.org/pdf/1707.04319.pdf.
[37] ZHU S, DONG X, SU H. Binary ensemble neural network: More bits per network or more networks per bit? [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 4923-4932.
[38] WANG Z, LU J, TAO C, et al. Learning channel-wise interactions for binary convolutional neural networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 568-577.
[39] LIU C, DING W, XIA X, et al. Circulant binary convolutional networks: Enhancing the performance of 1-bit dcnns with circulant back propagation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 2691-2699.
[40] COURBARIAUX M, BENGIO Y, DAVID J P. BinaryConnect: Training deep neural networks with binary weights during propagations [C]//Advances in Neural Information Processing Systems. 2015: 3123-3131.
[41] QIN H T, GONG R H, LIU X L, et al. Forward and backward information retention for accurate binary neural networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 2247-2256.
[42] WANG P, HU Q, ZHANG Y, et al. Two-step quantization for low-bit neural networks [C]// Proceedings of the IEEE Conference on computer vision and pattern recognition. 2018: 4376-4384.
[43] MELLEMPUDI N, KUNDU A, MUDIGERE D, et al. Ternary neural networks with fine-grained quantization [EB/OL]. (2017-05-30)[2020-07-03]. https://arxiv.org/pdf/1705.01462.pdf.
[44] ZHU F, GONG R, YU F, et al. Towards unified int8 training for convolutional neural network [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1969-1979.
[45] RASTEGARI M, ORDONEZ V, REDMON J, et al. Xnor-net: Imagenet classification using binary convolutional neural networks [C]//European Conference on Computer Vision. Cham: Springer, 2016: 525-542.
[46] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [EB/OL]. (2015-03-09)[2020-07-04]. https://arxiv.org/pdf/1503.02531.pdf.
[47] TIAN Y L, KRISHNAN D, ISOLA P. Contrastive representation distillation [EB/OL]. (2020-01-18)[2020-07-04]. https://arxiv.org/pdf/1910.10699.pdf.
[48] FURLANELLO T, LIPTON Z C, TSCHANNEN M, et al. Born again neural networks [C]//Proceedings of the 35th International Conference on Machine Learning. 2020: 1602-1611.
[49] GAO M Y, SHEN Y J, LI Q Q, et al. Residual knowledge distillation [EB/OL]. (2020-02-21)[2020-07-04]. https://arxiv.org/pdf/2002.09168.pdf.
[50] HE T, SHEN C, TIAN Z, et al. Knowledge adaptation for efficient semantic segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 578-587.
[51] LIN M, CHEN Q, YAN S C. Network in network [EB/OL]. (2014-03-04)[2020-07-04]. https://arxiv.org/pdf/1312.4400/.
[52] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size [EB/OL]. (2016-11-04)[2020-07-04]. https://arxiv.org/pdf/1602.07360.pdf.
[53] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications [EB/OL]. (2017-04-17)[2020-07-04]. https://arxiv.org/pdf/1704.04861.pdf.
[54] SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks [C] //Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.
[55] HOWARD A, SANDLER M, CHU G, et al. Searching for mobilenetv3 [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 1314-1324.
[56] ZHANG X, ZHOU X, LIN M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6848-6856.
[57] MA N, ZHANG X, ZHENG H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 116-131.
[58] HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
[59] HAN K, WANG Y, TIAN Q, et al. GhostNet: More features from cheap operations [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1580-1589.
[60] CHEN Y, FAN H, XU B, et al. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 3435-3444.
[61] CAMPBELL F W, ROBSON J G. Application of Fourier analysis to the visibility of gratings [J]. The Journal of Physiology, 1968, 197(3): 551-566.
[62] HUANG G, LIU S, VAN DER MAATEN L, et al. Condensenet: An efficient densenet using learned group convolutions [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 2752-2761.
[63] JADERBERG M, VEDALDI A, ZISSERMAN A. Speeding up convolutional neural networks with low rank expansions [EB/OL]. (2014-05-15)[2020-07-04]. https://arxiv.org/pdf/1405.3866.pdf.
[64] POLINO A, PASCANU R, ALISTARH D. Model compression via distillation and quantization [EB/OL]. (2018-02-15)[2020-07-04]. https://arxiv.org/pdf/1802.05668.pdf.
[65] 蔡瑞初, 钟椿荣, 余洋, 等. 面向“边缘”应用的卷积神经网络量化与压缩方法 [J]. 计算机应用, 2018, 38(9): 2449-2454
[66] YU X Y, LIU T L, WANG X C, et al. On compressing deep models by low rank and sparse decomposition [C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 7370-7379.
[67] CHENG J, WU J X, LENG C, et al. Quantized CNN: A unified approach to accelerate and compress convolutional networks [J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(10): 4730-4743.
[68] HU H Y, PENG R, TAI Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures [EB/OL]. (2016-07-12)[2020-7-04]. https://arxiv.org/pdf/1607.03250.pdf.
[69] WANG R J, LI X, LING C X. Pelee: A real-time object detection system on mobile devices [C]//Advances in Neural Information Processing Systems. 2018: 1963-1972.
[70] LI Y, LI J, LIN W, et al. Tiny-DSOD: Lightweight object detection for resource-restricted usages[EB/OL]. (2018-07-29)[2020-07-04]. https://arxiv.org/pdf/1807.11013.pdf.
[71] TAN M, PANG R, LE Q V.Efficientdet: Scalable and efficient object detection [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10781-10790.
[72] LI R, WANG Y, LIANG F, et al. Fully quantized network for object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 2810-2819.