华东师范大学学报(自然科学版) ›› 2021, Vol. 2021 ›› Issue (6): 112-123.doi: 10.3969/j.issn.1000-5641.2021.06.012

• 计算机科学 • 上一篇    下一篇

基于网络压缩与切割的深度模型边云协同加速机制研究

王诺, 李丽颖, 钱栋炜, 魏同权*()   

  1. 华东师范大学 计算机科学与技术学院, 上海 200062
  • 收稿日期:2020-09-10 出版日期:2021-11-25 发布日期:2021-11-26
  • 通讯作者: 魏同权 E-mail:tqwei@cs.ecnu.edu

Research on an Edge-Cloud collaborative acceleration mechanism of deep model based on network compression and partitioning

Nuo WANG, Liying LI, Dongwei QIAN, Tongquan WEI*()   

  1. School of Computer Science and Technologye, East China Normal University, Shanghai 200062, China
  • Received:2020-09-10 Online:2021-11-25 Published:2021-11-26
  • Contact: Tongquan WEI E-mail:tqwei@cs.ecnu.edu

摘要:

人工智能(Artificial Intelligence,AI)的先进技术已被广泛应用于实时地处理大量数据, 以期实现快速响应. 但是, 部署基于AI的各种应用程序的常规方法带来了巨大的计算和通信开销. 为了解决这一问题, 提出了一种基于网络压缩与切割技术的深度模型边云协同加速机制, 该技术可以压缩和划分深度神经网络(Deep Neural Networks, DNN)模型, 以边云协同的形式在实际应用中实现人工智能模型的快速响应. 首先压缩神经网络, 以降低神经网络所需要的运行时延, 并生成可用作候选分割点的新层, 然后训练预测模型以找到最佳分割点, 并将压缩的神经网络模型分为两部分. 将所获得的两部分分别部署在设备和云端服务器中, 这两个部分可以协同地将总延迟降至最低. 实验结果表明, 与4种基准测试方法相比, 本文所提出的方案可以将深度模型的总延迟至少降低70%.

关键词: 边云协同, 深度神经网络压缩, 深度神经网络切割

Abstract:

The advanced capabilities of artificial intelligence (AI) have been widely used to process large volumes of data in real-time for achieving rapid response. In contrast, conventional methods for deploying various AI-based applications can result in substantial computational and communication overhead. To solve this problem, a deep model Edge-Cloud collaborative acceleration mechanism based on network compression and partitioning technology is proposed. This technology can compress and partition deep neural networks (DNN), and deploy artificial intelligence models in practical applications in the form of an Edge-Cloud collaboration for rapid response. As a first step, the proposed method compresses the neural network to reduce the execution latency required and generates a new layer that can be used as a candidate partition point. It then trains a series of prediction models to find the best partitioning point and partitions the compressed neural network model into two parts. The two parts obtained are deployed in the edge device and the cloud server, respectively, and these two parts can collaborate to minimize the overall latency. Experimental results show that, compared with four benchmarking methods, the proposed scheme can reduce the total delay of the depth model by more than 70%.

Key words: Edge-Cloud collaboration, DNN compression, DNN partitioning

中图分类号: