华东师范大学学报(自然科学版) ›› 2024, Vol. 2024 ›› Issue (2): 76-85.doi: 10.3969/j.issn.1000-5641.2024.02.009

• 计算机科学 • 上一篇    下一篇

基于并行块的自适应量化随机计算

张永卓, 诸葛晴凤, 沙行勉*(), 宋玉红   

  1. 1. 华东师范大学 计算机科学与技术学院, 上海 200062
  • 收稿日期:2023-01-04 出版日期:2024-03-25 发布日期:2024-03-18
  • 通讯作者: 沙行勉 E-mail:edwinsha@cs.ecnu.edu.cn
  • 基金资助:
    国家自然科学基金 (61972154); 上海市科委项目 (20511101600)

Parallel block-based stochastic computing with adapted quantization

Yongzhuo ZHANG, Qingfeng ZHUGE, Edwin Hsing-Mean SHA*(), Yuhong SONG   

  1. 1. School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
  • Received:2023-01-04 Online:2024-03-25 Published:2024-03-18
  • Contact: Edwin Hsing-Mean SHA E-mail:edwinsha@cs.ecnu.edu.cn

摘要:

深度神经网络模型的庞大存储和高计算量的需求限制了其在面积和功耗受限的嵌入式设备上的部署. 为了解决这一问题, 随机计算将数据表示为一个随机序列, 继而通过基本逻辑运算单元实现加法和乘法等算术运算, 以减小神经网络的存储空间和降低计算复杂度. 然而, 当随机序列的长度较短时, 网络权重在从浮点数转换到随机序列的过程中存在离散化误差, 这会降低随机计算网络模型的推理准确率. 尽管使用更长的随机序列可以扩大随机序列的表示范围以缓解这一问题, 但也会导致更长的计算时延和更大的能源功耗. 本文提出了一种基于傅立叶变换的可微量化函数的设计, 可以在网络的训练过程中, 通过提高模型对随机序列的匹配度, 来减小数据转换过程中的离散化误差, 从而保证较短随机序列的随机计算神经网络的准确率. 此外, 还设计了一种加法器, 用于提高运算单元的准确性, 并通过将输入分块来并行计算以进一步缩短时延. 最后, 通过实验表明, 本文相较于其他方法可以提高20%的模型推理准确率, 并能够达到缩短50%的计算时延.

关键词: 随机计算, 量化, 神经网络优化

Abstract:

The demands of deep neural network models for computation and storage make them unsuitable for deployment on embedded devices with limited area and power. To solve this issue, stochastic computing reduces the storage and computational complexity of neural networks by representing data as a stochastic sequence, followed by arithmetic operations such as addition and multiplication through basic logic operation units. However, short stochastic sequences may cause discretization errors when converting network weights from floating point numbers to the stochastic sequence, which can reduce the inference accuracy of stochastic computing network models. Longer stochastic sequences can improve the representation range of stochastic sequences and alleviate this problem, but they also result in longer computational latency and higher energy consumption. We propose a design for a differentiable quantization function based on the Fourier transform. The function improves the matching of the model to stochastic sequences during the network’s training process, reducing the discretization error during data conversion. This ensures the accuracy of stochastic computational neural networks with short stochastic sequences. Additionally, we present an adder designed to enhance the accuracy of the operation unit and parallelize computations by chunking inputs, thereby reducing latency. Experimental results demonstrate a 20% improvement in model inference accuracy compared to other methods, as well as a 50% reduction in computational latency.

Key words: stochastic computing, quantization, neural network optimization

中图分类号: