Sparse training is emerging as a promising avenue for reducing the
compu...
FP8 is a natural progression for accelerating deep learning training
inf...
We propose a language and compiler to productively build high-performanc...
Code similarity systems are integral to a range of applications from cod...
The simplified parse tree (SPT) presented in Aroma, a state-of-the-art c...
We propose K-TanH, a novel, highly accurate, hardware efficient approxim...
This paper presents the first comprehensive empirical study demonstratin...
Machine learning (ML) techniques are enjoying rapidly increasing adoptio...
The state-of-the-art (SOTA) for mixed precision training is dominated by...
The exponential growth in use of large deep neural networks has accelera...
The nature of dark energy and the complete theory of gravity are two cen...
This paper presents the first, 15-PetaFLOP Deep Learning system for solv...
Sub-8-bit representation of DNNs incur some discernible loss of accuracy...
We propose a novel fine-grained quantization (FGQ) method to ternarize
p...
Word2vec is a widely used algorithm for extracting low-dimensional vecto...
Phenomenally successful in practical inference problems, convolutional n...
Word2Vec is a widely used algorithm for extracting low-dimensional vecto...
We propose BlackOut, an approximation algorithm to efficiently train mas...