Survey and Benchmarking of Precision-Scalable MAC Arrays for Embedded DNN Processing

08/10/2021
by   Ehab M. Ibrahim, et al.
0

Reduced-precision and variable-precision multiply-accumulate (MAC) operations provide opportunities to significantly improve energy efficiency and throughput of DNN accelerators with no/limited algorithmic performance loss, paving a way towards deploying AI applications on resource-constraint edge devices. Accordingly, various precision-scalable MAC array (PSMA) architectures were recently proposed. However, it is difficult to make a fair comparison between those alternatives, as each proposed PSMA is demonstrated in different systems with different technologies. This work aims to provide a clear view on the design space of PSMA and offer insights for selecting the optimal architectures based on designers' needs. First, we introduce a precision-enhanced for-loop representation for DNN dataflows. Next, we use this new representation towards a comprehensive PSMA taxonomy, capable to systematically cover most prominent state-of-the-art PSMAs, as well as uncover new PSMA architectures. Following that, we build a highly parameterized PSMA template that can be design-time configured into a huge subset of the design space spanned by the taxonomy. This allows to fairly and thoroughly benchmark 72 different PSMA architectures. We perform such studies in 28nm technology targeting run-time precision scalability from 8 to 2 bits, operating at 200 MHz and 1 GHz. Analyzing resulting energy efficiency and area breakdowns reveals key design guidelines for PSMA architectures.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset