Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point

01/31/2017
by   Naveen Mellempudi, et al.
0

We propose a cluster-based quantization method to convert pre-trained full precision weights into ternary weights with minimal impact on the accuracy. In addition, we also constrain the activations to 8-bits thus enabling sub 8-bit full integer inference pipeline. Our method uses smaller clusters of N filters with a common scaling factor to minimize the quantization loss, while also maximizing the number of ternary operations. We show that with a cluster size of N=4 on Resnet-101, can achieve 71.8 full precision results while replacing 85 accumulations. Using the same method with 4-bit weights achieves 76.3 accuracy which within 2 of the size of the cluster on both performance and accuracy, larger cluster sizes N=64 can replace 98 introduces significant drop in accuracy which necessitates fine tuning the parameters with retraining the network at lower precision. To address this we have also trained low-precision Resnet-50 with 8-bit activations and ternary weights by pre-initializing the network with full precision weights and achieve 68.9 run on a full 8-bit compute pipeline, with a potential 16x improvement in performance compared to baseline full-precision models.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset