Detailed Information

Cited 6 time in webofscience Cited 8 time in scopus
Metadata Downloads

Multipurpose Deep-Learning Accelerator for Arbitrary Quantization With Reduction of Storage, Logic, and Latency Waste

Authors
Moon, SeunghyunMun, Han-GyeolSon, HyunwooSim, Jae-Yoon
Issue Date
Jan-2024
Publisher
Institute of Electrical and Electronics Engineers
Keywords
Arbitrary quantization (AQ); bit-serial processing; Computer architecture; Decoding; deep neural network (DNN) accelerator; Hardware; lookup table (LUT); Moon; precision scalability; Quantization (signal); run-length compression (RLC); Table lookup; Task analysis
Citation
IEEE Journal of Solid-State Circuits, v.59, no.1, pp 1 - 14
Pages
14
Indexed
SCIE
SCOPUS
Journal Title
IEEE Journal of Solid-State Circuits
Volume
59
Number
1
Start Page
1
End Page
14
URI
https://scholarworks.gnu.ac.kr/handle/sw.gnu/68356
DOI
10.1109/JSSC.2023.3312615
ISSN
0018-9200
1558-173X
Abstract
Various pruning and quantization heuristics have been proposed to compress recent deep-learning models. However, the rapid development of new optimization techniques makes it difficult for domain-specific accelerators to efficiently process various models showing irregularly stored parameters or nonlinear quantization. This article presents a scalable-precision deep-learning accelerator that supports multiply-and-accumulate operations (MACs) with two arbitrarily quantized data sequences. The proposed accelerator includes three main features. To minimize logic overhead when processing arbitrarily quantized 8-bit precision data, a lookup table (LUT)-based runtime reconfiguration is proposed. The use of bit-serial execution without unnecessary computations enables the multiplication of data with non-equal precision while minimizing logic and latency waste. Furthermore, two distinct data formats, raw and run-length compressed, are supported by a zero-eliminator (ZE) and runtime-density detector (RDD) that are compatible with both formats, delivering enhanced storage and performance. For a precision range of 1–8 bit and fixed sparsity of 30%, the accelerator implemented in 28 nm low-power (LP) CMOS shows a peak performance of 0.87–5.55 TOPS and a power efficiency of 15.1–95.9 TOPS/W. The accelerator supports processing with arbitrary quantization (AQ) while achieving state-of-the-art (SOTA) power efficiency. IEEE
Files in This Item
There are no files associated with this item.
Appears in
Collections
공과대학 > 전자공학과 > Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Son, Hyun Woo photo

Son, Hyun Woo
IT공과대학 (전자공학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE