Ternary Residual Networks

07/15/2017
by   Abhisek Kundu, et al.
0

Sub-8-bit representation of DNNs incur some discernible loss of accuracy despite rigorous (re)training at low-precision. Such loss of accuracy essentially makes them equivalent to a much shallower counterpart, diminishing the power of being deep networks. To address this problem of accuracy drop we introduce the notion of residual networks where we add more low-precision edges to sensitive branches of the sub-8-bit network to compensate for the lost accuracy. Further, we present a perturbation theory to identify such sensitive edges. Aided by such an elegant trade-off between accuracy and compute, the 8-2 model (8-bit activations, ternary weights), enhanced by ternary residual edges, turns out to be sophisticated enough to achieve very high accuracy (∼ 1% drop from our FP-32 baseline), despite ∼ 1.6× reduction in model size, ∼ 26× reduction in number of multiplications, and potentially ∼ 2× power-performance gain comparing to 8-8 representation, on the state-of-the-art deep network ResNet-101 pre-trained on ImageNet dataset. Moreover, depending on the varying accuracy requirements in a dynamic environment, the deployed low-precision model can be upgraded/downgraded on-the-fly by partially enabling/disabling residual connections. For example, disabling the least important residual connections in the above enhanced network, the accuracy drop is ∼ 2% (from FP32), despite ∼ 1.9× reduction in model size, ∼ 32× reduction in number of multiplications, and potentially ∼ 2.3× power-performance gain comparing to 8-8 representation. Finally, all the ternary connections are sparse in nature, and the ternary residual conversion can be done in a resource-constraint setting with no low-precision (re)training.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset