Counting Cards: Exploiting Weight and Variance Distributions for Robust Compute In-Memory
Compute in-memory (CIM) is a promising technique that minimizes data transport, the primary performance bottleneck and energy cost of most data intensive applications. This has found wide-spread adoption in accelerating neural networks for machine learning applications. Utilizing a crossbar architecture with emerging non-volatile memories (eNVM) such as dense resistive random access memory (RRAM) or phase change random access memory (PCRAM), various forms of neural networks can be implemented to greatly reduce power and increase on chip memory capacity. However, compute in-memory faces its own limitations at both the circuit and the device levels. In this work, we explore the impact of device variation and peripheral circuit design constraints. Furthermore, we propose a new algorithm based on device variance and neural network weight distributions to increase both performance and accuracy for compute-in memory based designs. We demonstrate a 27 performance improvement for low and high variance eNVM, while satisfying a programmable threshold for a target error tolerance, which depends on the application.
READ FULL TEXT