Detailed Information

Cited 0 time in webofscience Cited 14 time in scopus
Metadata Downloads

A 127.8TOPS/W Arbitrarily Quantized 1-to-8b Scalable-Precision Accelerator for General-Purpose Deep Learning with Reduction of Storage, Logic and Latency Waste

Full metadata record
DC Field Value Language
dc.contributor.authorMoon, S.-
dc.contributor.authorMun, H.-G.-
dc.contributor.authorSon, H.-
dc.contributor.authorSim, J.-Y.-
dc.date.accessioned2023-04-25T04:40:15Z-
dc.date.available2023-04-25T04:40:15Z-
dc.date.issued2023-02-
dc.identifier.issn0193-6530-
dc.identifier.urihttps://scholarworks.gnu.ac.kr/handle/sw.gnu/59274-
dc.description.abstractResearch on deep learning accelerators has focused on inference tasks to improve performance by means of maximally utilizing sparsity and quantization. Unlike CNN-only networks, however, recent state-of-the-art (SOTA) models consist of multiple blocks of various layers with different layer-by-layer characteristics in sparsity and required precision. This trend presents challenges in building a general accelerator architecture to maximize the benefits from sparsity and quantization, while supporting efficient processing for various models ranging from traditional CNNs to the new models to come in the future. First, there are multiple considerations that include the bottleneck in data bandwidth, as well as the trade-off between sparsity and required precision. The required precision is likely to increase as the sparsity increases. This underpins the need for flexibility in setting the quantization with a layer-by-layer configuration. In addition, storing data in a unified format can also prohibit the maximum utilization of hardware resources. Since recent models have large variations in sparsity [11], a major portion of data movement might be taken by sending zeros, causing a severe waste in data bandwidth. We propose a sparsity-aware accelerator that adaptively changes the data format by detecting the sparsity of the given task. Data is stored in raw format when the sparse rate is low and in compressed format (run-length coding, RLC) when the sparse rate is high. Second, there is a correlation between the effective precision and the quantization policy. Arbitrary quantization has demonstrated a higher level of quality of result (QoR) compared to linear quantization (denoted as INT). There have been two representative approaches in nonlinear quantization: 1) arbitrary basis (AB) where quantized values are given by linear combinations of n independent bases, and 2) arbitrary quantization (AQ) which has arbitrary 2 quantized values. Though these quantization schemes achieve good accuracy, there has been no hardware implementation for efficient processing of AQ. The conventional INT multiplication increases the complexity by 4x as both input precisions double. On the other hand, if AQ with a scalable precision of up to 8b is implemented using a look-up-table (LUT) approach, it would explode hardware complexity. To resolve this problem, we propose a hierarchical decoding architecture for AQ with a scalable precision up to 8b. Finally, the required precisions for inputs and weights are not the same [4], [10]. Good QoR is realized by assigning more bits to inputs and fewer bits to weights. Previous accelerators handle inputs and weights with a fixed and equal precision leading to the waste of computational energy. This work employs a dynamic-precision bit-serial multiplication for the weights to minimize waste of energy. Putting them all together, we propose a 1-to-8b scalable-precision general-purpose deep learning accelerator to support multiply-and-accumulate (MAC) operations with input and weight vectors quantized by AQ and AB, respectively. The accelerator includes three main features: 1) a zero elimination scheme that works with two data formats, raw and RLC, to save storage cost and to improve effective bandwidth, 2) extended-precision AQ computing hardware without exploding logic complexity, and 3) bit-serial AB processing without unnecessary computations. © 2023 IEEE.-
dc.format.extent3-
dc.language영어-
dc.language.isoENG-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.titleA 127.8TOPS/W Arbitrarily Quantized 1-to-8b Scalable-Precision Accelerator for General-Purpose Deep Learning with Reduction of Storage, Logic and Latency Waste-
dc.typeArticle-
dc.identifier.doi10.1109/ISSCC42615.2023.10067615-
dc.identifier.scopusid2-s2.0-85151722875-
dc.identifier.bibliographicCitationDigest of Technical Papers - IEEE International Solid-State Circuits Conference, v.2023-February, pp 330 - 332-
dc.citation.titleDigest of Technical Papers - IEEE International Solid-State Circuits Conference-
dc.citation.volume2023-February-
dc.citation.startPage330-
dc.citation.endPage332-
dc.type.docTypeConference Paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
Files in This Item
There are no files associated with this item.
Appears in
Collections
공과대학 > 전자공학과 > Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Son, Hyun Woo photo

Son, Hyun Woo
IT공과대학 (전자공학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE