DKM: Differentiable K-Means Clustering Layer for Neural Network Compression

08/28/2021
by   Minsik Cho, et al.
33

Deep neural network (DNN) model compression for efficient on-device inference is becoming increasingly important to reduce memory requirements and keep user data on-device. To this end, we propose a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering-based DNN model compression. DKM casts k-means clustering as an attention problem and enables joint optimization of the parameters and clustering centroids. Unlike prior works that rely on additional regularizers and parameters, DKM-based compression keeps the original loss function and model architecture fixed. We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DMK delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks. For example, DKM-based compression can offer 74.5 model size (29.4x model compression factor). For MobileNet-v1, which is a challenging DNN to compress, DKM delivers 62.8 0.74 MB model size (22.4x model compression factor). This result is 6.8 top-1 accuracy and 33 state-of-the-art DNN compression algorithms. Additionally, DKM enables compression of DistilBERT model by 11.8x with minimal (1.1 GLUE NLP benchmarks.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset