Journal of East China Normal University(Natural Science) ›› 2024, Vol. 2024 ›› Issue (2): 76-85.doi: 10.3969/j.issn.1000-5641.2024.02.009

• Computer Science • Previous Articles     Next Articles

Parallel block-based stochastic computing with adapted quantization

Yongzhuo ZHANG, Qingfeng ZHUGE, Edwin Hsing-Mean SHA*(), Yuhong SONG   

  1. 1. School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
  • Received:2023-01-04 Online:2024-03-25 Published:2024-03-18
  • Contact: Edwin Hsing-Mean SHA E-mail:edwinsha@cs.ecnu.edu.cn

Abstract:

The demands of deep neural network models for computation and storage make them unsuitable for deployment on embedded devices with limited area and power. To solve this issue, stochastic computing reduces the storage and computational complexity of neural networks by representing data as a stochastic sequence, followed by arithmetic operations such as addition and multiplication through basic logic operation units. However, short stochastic sequences may cause discretization errors when converting network weights from floating point numbers to the stochastic sequence, which can reduce the inference accuracy of stochastic computing network models. Longer stochastic sequences can improve the representation range of stochastic sequences and alleviate this problem, but they also result in longer computational latency and higher energy consumption. We propose a design for a differentiable quantization function based on the Fourier transform. The function improves the matching of the model to stochastic sequences during the network’s training process, reducing the discretization error during data conversion. This ensures the accuracy of stochastic computational neural networks with short stochastic sequences. Additionally, we present an adder designed to enhance the accuracy of the operation unit and parallelize computations by chunking inputs, thereby reducing latency. Experimental results demonstrate a 20% improvement in model inference accuracy compared to other methods, as well as a 50% reduction in computational latency.

Key words: stochastic computing, quantization, neural network optimization

CLC Number: