Text-to-image generation (TTI) refers to the usage of models that could
...
ChatGPT-like models have revolutionized various applications in artifici...
In the complex domain of large language models (LLMs), striking a balanc...
Post-training quantization () had been recently shown as a compromising
...
Improving the deployment efficiency of transformer-based language models...
Recent advances on deep learning models come at the price of formidable
...
Large-scale transformer models have become the de-facto architectures fo...
How to efficiently serve ever-larger trained natural language models in
...
Extreme compression, particularly ultra-low bit precision (binary/ternar...
We propose an adaptive (stochastic) gradient perturbation method for
dif...
We propose a computationally-friendly adaptive learning rate schedule,
"...
One of the challenges for current sequence to sequence (seq2seq) models ...
Inspired by human learning, researchers have proposed ordering examples
...
The presence of outliers can potentially significantly skew the paramete...
Normalization methods such as batch normalization are commonly used in
o...
We prove that the norm version of the adaptive stochastic gradient metho...
Adaptive gradient methods like AdaGrad are widely used in optimizing neu...
Adaptive gradient methods such as AdaGrad and its variants update the
st...
Adjusting the learning rate schedule in stochastic gradient methods is a...