Mehdi Rezagholizadeh

research

∙ 09/16/2023

Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)

The rapid advancement of large language models (LLMs) has revolutionized...

0 Parsa Kavehzadeh, et al. ∙

research

∙ 09/01/2023

SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks

As the size of deep learning models continues to grow, finding optimal m...

0 Mojtaba Valipour, et al. ∙

research

∙ 07/11/2023

Attribute Controlled Dialogue Prompting

Prompt-tuning has become an increasingly popular parameter-efficient met...

0 Runcheng Liu, et al. ∙

research

∙ 06/12/2023

Multimodal Audio-textual Architecture for Robust Spoken Language Understanding

Recent voice assistants are usually based on the cascade spoken language...

0 Anderson R. Avila, et al. ∙

research

∙ 06/11/2023

AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

Developing monolingual large Pre-trained Language Models (PLMs) is shown...

0 Asaad AlGhamdi, et al. ∙

research

∙ 05/24/2023

Measuring the Knowledge Acquisition-Utilization Gap in Pretrained Language Models

While pre-trained language models (PLMs) have shown evidence of acquirin...

0 Amirhossein Kazemnejad, et al. ∙

research

∙ 05/23/2023

On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications

Large self-supervised pre-trained speech models have achieved remarkable...

0 Vamsikrishna Chemudupati, et al. ∙

research

∙ 05/10/2023

Evaluating Embedding APIs for Information Retrieval

The ever-increasing size of language models curtails their widespread ac...

0 Ehsan Kamalloo, et al. ∙

research

∙ 05/09/2023

An Exploration into the Performance of Unsupervised Cross-Task Speech Representations for "In the Wild” Edge Applications

Unsupervised speech models are becoming ubiquitous in the speech and mac...

0 Heitor Guimarães, et al. ∙

research

∙ 05/08/2023

LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

Regularization techniques are crucial to improving the generalization pe...

0 Peng Lu, et al. ∙

research

∙ 04/03/2023

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

The advent of multilingual language models has generated a resurgence of...

0 Jimmy Lin, et al. ∙

research

∙ 02/18/2023

RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness

Self-supervised speech pre-training enables deep neural network models t...

0 Heitor R. Guimarães, et al. ∙

research

∙ 01/27/2023

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

Knowledge distillation (KD) is one of the prominent techniques for model...

0 Aref Jafari, et al. ∙

research

∙ 12/20/2022

KronA: Parameter Efficient Tuning with Kronecker Adapter

Fine-tuning a Pre-trained Language Model (PLM) on a specific downstream ...

0 Ali Edalati, et al. ∙

research

∙ 12/12/2022

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

Knowledge Distillation (KD) has been extensively used for natural langua...

0 Aref Jafari, et al. ∙

research

∙ 12/12/2022

Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging

Knowledge Distillation (KD) is a commonly used technique for improving t...

0 Peng Lu, et al. ∙

research

∙ 11/12/2022

Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement

Self-supervised speech representation learning aims to extract meaningfu...

0 Heitor R. Guimarães, et al. ∙

research

∙ 10/14/2022

DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

With the ever-growing size of pre-trained models (PMs), fine-tuning them...

0 Mojtaba Valipour, et al. ∙

research

∙ 09/20/2022

Integer Fine-tuning of Transformer-based Models

Transformer based models are used to achieve state-of-the-art performanc...

0 Mohammadreza Tayaranian, et al. ∙

research

∙ 06/30/2022

Learning Functions on Multiple Sets using Multi-Set Transformers

We propose a general deep architecture for learning functions on multipl...

0 Kira Selby, et al. ∙

research

∙ 05/25/2022

Towards Understanding Label Regularization for Fine-tuning Pre-trained Language Models

Knowledge Distillation (KD) is a prominent neural model compression tech...

0 Ivan Kobyzev, et al. ∙

research

∙ 04/18/2022

Dynamic Position Encoding for Transformers

Recurrent models have been dominating the field of neural machine transl...

0 Joyce Zheng, et al. ∙

research

∙ 04/15/2022

CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation

Knowledge distillation (KD) is an efficient framework for compressing la...

0 Md. Akmal Haidar, et al. ∙

research

∙ 03/17/2022

When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation

Data Augmentation (DA) is known to improve the generalizability of deep ...

0 Ehsan Kamalloo, et al. ∙

research

∙ 11/09/2021

NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language Evaluation

Slot-filling and intent detection are the backbone of conversational age...

0 David Alfonso-Hermelo, et al. ∙

research

∙ 10/16/2021

Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher

With ever growing scale of neural models, knowledge distillation (KD) at...

4 Mehdi Rezagholizadeh, et al. ∙

research

∙ 10/15/2021

Kronecker Decomposition for GPT Compression

GPT is an auto-regressive Transformer-based pre-trained language model w...

7 Ali Edalati, et al. ∙

research

∙ 09/21/2021

RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation

Intermediate layer knowledge distillation (KD) can improve the standard ...

0 Md. Akmal Haidar, et al. ∙

research

∙ 09/21/2021

Knowledge Distillation with Noisy Labels for Natural Language Understanding

Knowledge Distillation (KD) is extensively used to compress and deploy l...

0 Shivendra Bhardwaj, et al. ∙

research

∙ 09/13/2021

KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation

The development of over-parameterized pre-trained language models has ma...

0 Marzieh S. Tahaei, et al. ∙

research

∙ 09/13/2021

How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

Knowledge Distillation (KD) is a model compression algorithm that helps ...

5 Tianda Li, et al. ∙

research

∙ 09/05/2021

End-to-End Self-Debiasing Framework for Robust NLU Training

Existing Natural Language Understanding (NLU) models have been shown to ...

0 Abbas Ghaddar, et al. ∙

research

∙ 07/24/2021

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

In this work, we examine the ability of NER models to use contextual inf...

0 Abbas Ghaddar, et al. ∙

research

∙ 05/28/2021

Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax

In Natural Language Processing (NLP), finding data augmentation techniqu...

0 Ehsan Kamalloo, et al. ∙

research

∙ 05/12/2021

MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation

The advent of large pre-trained language models has given rise to rapid ...

19 Ahmad Rashid, et al. ∙

research

∙ 04/17/2021

Robust Embeddings Via Distributions

Despite recent monumental advances in the field, many Natural Language P...

0 Kira A. Selby, et al. ∙

research

∙ 04/14/2021

Annealing Knowledge Distillation

Significant memory and computational requirements of large deep neural n...

9 Aref Jafari, et al. ∙

research

∙ 03/17/2021

Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation

End-to-end automatic speech recognition (ASR), unlike conventional ASR, ...

0 Md. Akmal Haidar, et al. ∙

research

∙ 03/10/2021

Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks

Adversarial training of end-to-end (E2E) ASR systems using generative ad...

0 Md. Akmal Haidar, et al. ∙

research

∙ 12/31/2020

Towards Zero-Shot Knowledge Distillation for Natural Language Processing

Knowledge Distillation (KD) is a common knowledge transfer algorithm use...

13 Ahmad Rashid, et al. ∙

research

∙ 11/09/2019

Fully Quantizing a Simplified Transformer for End-to-end Speech Recognition

While significant improvements have been made in recent years in terms o...

0 Alex Bie, et al. ∙

research

∙ 10/17/2019

Fully Quantized Transformer for Improved Translation

State-of-the-art neural machine translation methods employ massive amoun...

0 Gabriele Prato, et al. ∙

research

∙ 10/02/2019

Distilled embedding: non-linear embedding factorization using knowledge distillation

Word-embeddings are a vital component of Natural Language Processing (NL...

1 Vasileios Lioutas, et al. ∙

research

∙ 06/19/2019

EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing

We present the first sentence simplification model that learns explicit ...

0 Yue Dong, et al. ∙

research

∙ 04/23/2019

TextKD-GAN: Text Generation using KnowledgeDistillation and Generative Adversarial Networks

Text generation is of particular interest in many NLP applications such ...

0 Md. Akmal Haidar, et al. ∙

research

∙ 04/15/2019

Latent Code and Text-based Generative Adversarial Networks for Soft-text Generation

Text generation with generative adversarial networks (GANs) can be divid...

0 Md. Akmal Haidar, et al. ∙

research

∙ 04/09/2019

Bilingual-GAN: A Step Towards Parallel Text Generation

Latent space based GAN methods and attention based sequence to sequence ...

0 Ahmad Rashid, et al. ∙

research

∙ 09/28/2018

SALSA-TEXT : self attentive latent space based adversarial text generation

Inspired by the success of self attention mechanism and Transformer arch...

0 Jules Gagnon-Marchand, et al. ∙

Mehdi Rezagholizadeh

Featured Co-authors

Sign in with Google

Consider DeepAI Pro