Parallel block-based stochastic computing with adapted quantization

doi:10.3969/j.issn.1000-5641.2024.02.009

Abstract

Abstract:

The demands of deep neural network models for computation and storage make them unsuitable for deployment on embedded devices with limited area and power. To solve this issue, stochastic computing reduces the storage and computational complexity of neural networks by representing data as a stochastic sequence, followed by arithmetic operations such as addition and multiplication through basic logic operation units. However, short stochastic sequences may cause discretization errors when converting network weights from floating point numbers to the stochastic sequence, which can reduce the inference accuracy of stochastic computing network models. Longer stochastic sequences can improve the representation range of stochastic sequences and alleviate this problem, but they also result in longer computational latency and higher energy consumption. We propose a design for a differentiable quantization function based on the Fourier transform. The function improves the matching of the model to stochastic sequences during the network’s training process, reducing the discretization error during data conversion. This ensures the accuracy of stochastic computational neural networks with short stochastic sequences. Additionally, we present an adder designed to enhance the accuracy of the operation unit and parallelize computations by chunking inputs, thereby reducing latency. Experimental results demonstrate a 20% improvement in model inference accuracy compared to other methods, as well as a 50% reduction in computational latency.

Key words: stochastic computing, quantization, neural network optimization

CLC Number:

TP391.4

Yongzhuo ZHANG, Qingfeng ZHUGE, Edwin Hsing-Mean SHA, Yuhong SONG. Parallel block-based stochastic computing with adapted quantization[J]. Journal of East China Normal University(Natural Science), 2024, 2024(2): 76-85.

Figures/Tables 10

Table 1

Fig.1

Fig.2

Fig.3

Fig.4

Table 2

Table 3

Table 4

Fig.5

Table 5

References 21

1	JIANG W W, YANG L, DASGUPTA S, et al.. Standing on the shoulders of giants: Hardware and neural architecture co-search with hot start. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39 (11): 4154- 4165.
2	JIANG W W, YANG L, SHA E H M, et al.. Hardware/software co-exploration of neural architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39 (12): 4805- 4815.
3	SONG Y H, JIANG W W, LI B B, et al. Dancing along battery: Enabling transformer with run-time reconfigurability on mobile devices [C]// 2021 58th ACM/IEEE Design Automation Conference. 2021: 1003-1008.
4	JIANG W W, ZHANG X Y, SHA E H M, et al. Accuracy vs. efficiency: Achieving both through FPGA-implementation aware neural architecture search [C]// Proceedings of the 56th Annual Design Automation Conference. 2019. DOI: https://doi.org/10.1145/3316781.3317757.
5	PENG H, HUANG S, GENG T, et al. Accelerating transformer-based deep learning models on FPGAs using column balanced block pruning [C]// 2021 22nd International Symposium on Quality Electronic Design. 2021: 142-148.
6	QI P J, SHA E H M, ZHUGE Q F, et al. Accelerating framework of transformer by hardware design and model compression co-optimization [C]// 2021 IEEE/ACM International Conference on Computer Aided Design. 2021. DOI: https://doi.org/10.1109/ICCAD51958.2021.9643586.
7	YANG L, JIANG W W, LIU W, et al. Co-exploring neural architecture and network-on-chip design for real-time artificial intelligence [C]// 2020 25th Asia and South Pacific Design Automation Conference. 2020: 85-90.
8	GAINES B R. Stochastic Computing Systems [M]// TOU J T. Advances in Information Systems Science. New York: Springer, 1969: 37-172.
9	LI B, NAJAFI M H, LILJA D J. Using stochastic computing to reduce the hardware requirements for a restricted boltzmann machine classifier [C]// Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2016: 36-41.
10	LI B, NAJAFI M H, YUAN B, et al. Quantized neural networks with new stochastic multipliers [C]// 2018 19th International Symposium on Quality Electronic Design. 2018: 376-382.
11	LI B, QIN Y, YUAN B, et al. Neural network classifiers using stochastic computing with a hardware-oriented approximate activation function [C]// 2017 IEEE International Conference on Computer Design. 2017: 97-104.
12	LI B, QIN Y, YUAN B, et al.. Neural network classifiers using a hardware-based approximate activation function with a hybrid stochastic multiplier. ACM Journal on Emerging Technologies in Computing Systems, 2019, 15 (1): 12.
13	LIU Y, LIU S, WANG Y, et al.. A survey of stochastic computing neural networks for machine learning applications. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32 (7): 2809- 2824.
14	QIAN W, LI X, RIEDEL M D, et al.. An architecture for fault-tolerant computation with stochastic logic. IEEE Transactions on Computers, 2011, 60 (1): 93- 105.
15	KIM K, KIM J, YU J, et al. Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks [C]// Proceedings of the 53rd Annual Design Automation Conference. 2016. DOI: https://doi.org/10.1145/2897937.2898011.
16	ZHAKATAYEV A, LEE S, SIM H, et al. Sign-magnitude SC: Getting 10X accuracy for free in stochastic computing for deep neural networks [C]// 2018 55th ACM/ESDA/IEEE Design Automation Conference. 2018: 158.
17	JENSON D, RIEDEL M. A deterministic approach to stochastic computation [C]// 2016 IEEE/ACM International Conference on Computer-Aided Design. 2016. DOI: https://doi.org/10.1145/2966986.2966988.
18	SIM H, LEE J. A new stochastic computing multiplier with application to deep convolutional neural networks [C]// Proceedings of the 54th Annual Design Automation Conference. 2017. DOI: https://doi.org/10.1145/3061639.3062290.
19	FARAJI S R, NAJAFI M H, LI B, et al. Energy-efficient convolutional neural networks with deterministic bit-stream processing [C]// 2019 Design, Automation & Test in Europe Conference & Exhibition. 2019: 1757-1762.
20	WU D, LI J, YIN R, et al. uGEMM: Unary computing architecture for GEMM applications [C]// Proceedings-International Symposium on Computer Architecture. 2020: 377-390.
21	SONG Y H, SHA E H M, ZHUGE Q F, et al. BSC: Block-based stochastic computing to enable accurate and efficient TinyML [C]// 2022 27th Asia and South Pacific Design Automation Conference. 2022: 314-319.

浮点数	不同位长度下随机序列的取值
浮点数	4 位	8 位	16 位	32 位	64 位	128 位
–0.83516	–1	–0.75	–0.875	–0.8125	–0.84375	–0.828125
–0.35164	–0.5	–0.25	–0.375	–0.3750	–0.34375	–0.359375
0.07326	0	0.25	0.125	0.0625	0.06250	0.078125
0.66407	0.5	0.75	0.625	0.6875	0.65625	0.671875

方法	不同位长度下无量化的MAE					不同位长度下线性量化的MAE					不同位长度下SCQ的MAE
方法	8 位	16 位	32 位	64 位	128 位	8 位	16 位	32 位	64 位	128 位	8 位	16 位	32 位	64 位	128 位
uGEMM^[20]	0.371	0.257	0.162	0.095	0.053	0.371	0.255	0.162	0.095	0.052	0.343	0.244	0.156	0.092	0.051
uMULACC^[21]	0.166	0.100	0.058	0.032	0.017	0.161	0.100	0.057	0.032	0.017	0.112	0.078	0.048	0.028	0.015
BSC^[21]	0.121	0.065	0.034	0.018	0.010	0.119	0.065	0.034	0.018	0.009	0.056	0.038	0.023	0.013	0.007
BPTADD	0.121	0.065	0.034	0.018	0.010	0.119	0.065	0.034	0.018	0.009	0.056	0.038	0.023	0.013	0.007

加法器	不同位长度下的MAE
加法器	4 位	8 位	16 位	32 位	64 位	128 位
uNSADD^[20]	0.351	0.248	0.153	0.088	0.055	0.024
ACCADD^[21]	0.167	0.097	0.054	0.032	0.016	0.009
BLKADD^[21]	0.117	0.051	0.027	0.010	0.006	0.004
BPTADD	0.117	0.051	0.027	0.010	0.006	0.004

位长度/位	#块数	时延/ms		MAE
位长度/位	#块数	$ m={\mathrm{l}\mathrm{o}\mathrm{g}}_{2}b $	$ {m}={\mathrm{l}\mathrm{o}\mathrm{g}}_{2}b-1 $	$ m={\mathrm{l}\mathrm{o}\mathrm{g}}_{2}b $	$ {m}={\mathrm{l}\mathrm{o}\mathrm{g}}_{2}b-1 $
4	4	1.89566	1.65870	0.239	0.273
8	4	0.00014	0.00013	0.119	0.149
8	8	0.00015	0.00014	0.120	0.149
16	4	0.00024	0.00023	0.095	0.103
	8	0.00020	0.00019	0.095	0.103
	16	0.00021	0.00020	0.095	0.103
32	4	0.00036	0.00035	0.064	0.065
	8	0.00033	0.00032	0.064	0.066
	16	0.00034	0.00033	0.064	0.066
	32	0.00042	0.00041	0.064	0.065
64	4	0.0008	0.00079	0.039	0.040
	8	0.00062	0.00061	0.039	0.040
	16	0.00059	0.00058	0.039	0.040
	32	0.00067	0.00067	0.039	0.040
128	4	0.00079	0.00078	0.023	0.023
	8	0.00066	0.00066	0.023	0.023
	16	0.00060	0.00060	0.023	0.023
	32	0.00057	0.00057	0.023	0.023

方法	不同 (位长度, 块数) 下的无量化/%				不同 (位长度, 块数) 下的线性量化/%				不同 (位长度, 块数) 下的SCQ/%				不同 (位长度, 块数) 下的时延/ms
方法	(4, 4)	(8, 4)	(16, 8)	(32, 8)	(4, 4)	(8, 4)	(16, 8)	(32, 8)	(4, 4)	(8, 4)	(16, 8)	(32, 8)	(4, 4)	(8, 4)	(16, 8)	(32, 8)
uGEMM^[20]	11.90	16.10	40.60	84.10	12.70	20.50	36.90	81.80	13.10	20.80	45.70	90.90	27.625	46.348	81.547	150.358
uMULACC^[21]	58.30	82.10	96.20	96.50	70.60	83.00	96.00	96.90	76.40	90.00	96.50	96.20	11.487	14.837	21.954	35.543
BSC^[21]	73.30	92.70	96.80	96.60	81.40	92.50	97.40	97.30	90.60	94.40	97.50	97.20	15.791	21.346	33.555	62.504
BPTADD	73.10	92.90	96.90	97.30	82.30	92.80	96.20	96.90	90.50	94.80	96.60	97.60	12.633	14.230	25.660	25.737