The rapid advancement of large language models (LLMs) has revolutionized...
As the size of deep learning models continues to grow, finding optimal m...
Prompt-tuning has become an increasingly popular parameter-efficient met...
Recent voice assistants are usually based on the cascade spoken language...
Developing monolingual large Pre-trained Language Models (PLMs) is shown...
While pre-trained language models (PLMs) have shown evidence of acquirin...
Large self-supervised pre-trained speech models have achieved remarkable...
The ever-increasing size of language models curtails their widespread ac...
Unsupervised speech models are becoming ubiquitous in the speech and mac...
Regularization techniques are crucial to improving the generalization
pe...
The advent of multilingual language models has generated a resurgence of...
Self-supervised speech pre-training enables deep neural network models t...
Knowledge distillation (KD) is one of the prominent techniques for model...
Fine-tuning a Pre-trained Language Model (PLM) on a specific downstream ...
Knowledge Distillation (KD) has been extensively used for natural langua...
Knowledge Distillation (KD) is a commonly used technique for improving t...
Self-supervised speech representation learning aims to extract meaningfu...
With the ever-growing size of pre-trained models (PMs), fine-tuning them...
Transformer based models are used to achieve state-of-the-art performanc...
We propose a general deep architecture for learning functions on multipl...
Knowledge Distillation (KD) is a prominent neural model compression tech...
Recurrent models have been dominating the field of neural machine transl...
Knowledge distillation (KD) is an efficient framework for compressing
la...
Data Augmentation (DA) is known to improve the generalizability of deep
...
Slot-filling and intent detection are the backbone of conversational age...
With ever growing scale of neural models, knowledge distillation (KD)
at...
GPT is an auto-regressive Transformer-based pre-trained language model w...
Intermediate layer knowledge distillation (KD) can improve the standard ...
Knowledge Distillation (KD) is extensively used to compress and deploy l...
The development of over-parameterized pre-trained language models has ma...
Knowledge Distillation (KD) is a model compression algorithm that helps
...
Existing Natural Language Understanding (NLU) models have been shown to
...
In this work, we examine the ability of NER models to use contextual
inf...
In Natural Language Processing (NLP), finding data augmentation techniqu...
The advent of large pre-trained language models has given rise to rapid
...
Despite recent monumental advances in the field, many Natural Language
P...
Significant memory and computational requirements of large deep neural
n...
End-to-end automatic speech recognition (ASR), unlike conventional ASR, ...
Adversarial training of end-to-end (E2E) ASR systems using generative
ad...
Knowledge Distillation (KD) is a common knowledge transfer algorithm use...
While significant improvements have been made in recent years in terms o...
State-of-the-art neural machine translation methods employ massive amoun...
Word-embeddings are a vital component of Natural Language Processing (NL...
We present the first sentence simplification model that learns explicit ...
Text generation is of particular interest in many NLP applications such ...
Text generation with generative adversarial networks (GANs) can be divid...
Latent space based GAN methods and attention based sequence to sequence
...
Inspired by the success of self attention mechanism and Transformer
arch...