There are growing interests in adapting large-scale language models usin...
The recent advance of self-supervised learning associated with the
Trans...
Whereas diverse variations of diffusion models exist, expanding the line...
While model compression is increasingly important because of large neura...
Even though fine-grained pruning techniques achieve a high compression r...
Various post-training uniform quantization methods have usually been stu...
Transformer is being widely used in Neural Machine Translation (NMT).
De...
Quantization based on the binary codes is gaining attention because each...
The number of parameters in deep neural networks (DNNs) is rapidly incre...
Low-rank approximation is an effective model compression technique to no...
Model compression techniques, such as pruning and quantization, are beco...
Pruning is an efficient model compression technique to remove redundancy...