Ankit Singh Rawat

research

∙ 07/06/2023

When Does Confidence-Based Cascade Deferral Suffice?

Cascades are a classical strategy to enable inference cost to vary adapt...

0 Wittawat Jitkrittum, et al. ∙

research

∙ 06/06/2023

On the Role of Attention in Prompt-tuning

Prompt-tuning is an emerging strategy to adapt large language models (LL...

0 Samet Oymak, et al. ∙

research

∙ 02/03/2023

ResMem: Learn what you can and memorize the rest

The impressive generalization performance of modern neural networks is a...

2 Zitong Yang, et al. ∙

research

∙ 01/28/2023

Supervision Complexity and its Role in Knowledge Distillation

Despite the popularity and efficacy of knowledge distillation, there is ...

8 Hrayr Harutyunyan, et al. ∙

research

∙ 01/27/2023

EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

Large neural models (such as Transformers) achieve state-of-the-art perf...

12 Seungyeon Kim, et al. ∙

research

∙ 11/09/2022

Large Language Models with Controllable Working Memory

Large language models (LLMs) have led to a series of breakthroughs in na...

6 Daliang Li, et al. ∙

research

∙ 10/12/2022

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

This paper studies the curious phenomenon for machine learning models wi...

19 Zonglin Li, et al. ∙

research

∙ 10/06/2022

Generalization Properties of Retrieval-based Models

Many modern high-performing machine learning models such as GPT-3 primar...

0 Soumya Basu, et al. ∙

research

∙ 10/05/2022

A Fourier Approach to Mixture Learning

We revisit the problem of learning mixtures of spherical Gaussians. Give...

0 Mingda Qiao, et al. ∙

research

∙ 04/27/2022

ELM: Embedding and Logit Margins for Long-Tail Learning

Long-tail learning is the problem of learning under skewed label distrib...

9 Wittawat Jitkrittum, et al. ∙

research

∙ 01/28/2022

FedLite: A Scalable Approach for Federated Learning on Resource-constrained Clients

In classical federated learning, the clients contribute to the overall t...

0 Jianyu Wang, et al. ∙

research

∙ 10/19/2021

When in Doubt, Summon the Titans: Efficient Inference with Large Models

Scaling neural networks to "large" sizes, with billions of parameters, h...

5 Ankit Singh Rawat, et al. ∙

research

∙ 05/12/2021

Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

Negative sampling schemes enable efficient training given a large number...

2 Ankit Singh Rawat, et al. ∙

research

∙ 02/13/2021

Distilling Double Descent

Distillation is the technique of training a "student" model based on exa...

0 Andrew Cotter, et al. ∙

research

∙ 02/05/2021

On the Reproducibility of Neural Network Predictions

Standard training techniques for neural networks involve multiple source...

14 Srinadh Bhojanapalli, et al. ∙

research

∙ 12/01/2020

Modifying Memories in Transformer Models

Large Transformer models have achieved impressive performance in many na...

0 Chen Zhu, et al. ∙

research

∙ 06/08/2020

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers

Transformer networks use pairwise attention to compute contextual embedd...

5 Chulhee Yun, et al. ∙

research

∙ 05/21/2020

Why distillation helps: a statistical perspective

Knowledge distillation is a technique for improving the performance of a...

41 Aditya Krishna Menon, et al. ∙

research

∙ 04/23/2020

Doubly-stochastic mining for heterogeneous retrieval

Modern retrieval problems are characterised by training sets with potent...

6 Ankit Singh Rawat, et al. ∙

research

∙ 04/21/2020

Federated Learning with Only Positive Labels

We consider learning a multi-class classification model in the federated...

9 Felix X. Yu, et al. ∙

research

∙ 04/11/2020

Robust Large-Margin Learning in Hyperbolic Space

Recently, there has been a surge of interest in representation learning ...

12 Melanie Weber, et al. ∙

research

∙ 02/20/2020

Reliable Distributed Clustering with Redundant Data Assignment

In this paper, we present distributed generalized clustering algorithms ...

0 Venkata Gandikota, et al. ∙

research

∙ 02/17/2020

Low-Rank Bottleneck in Multi-head Attention Models

Attention based Transformer architecture has enabled significant advance...

9 Srinadh Bhojanapalli, et al. ∙

research

∙ 01/27/2020

Achieving Multi-Port Memory Performance on Single-Port Memory with Coding Techniques

Many performance critical systems today must rely on performance enhance...

0 Hardik Jain, et al. ∙

research

∙ 12/20/2019

Are Transformers universal approximators of sequence-to-sequence functions?

Despite the widespread adoption of Transformer models for NLP tasks, the...

24 Chulhee Yun, et al. ∙

research

∙ 07/24/2019

Sampled Softmax with Random Fourier Features

The computational cost of training with softmax cross entropy loss grows...

5 Ankit Singh Rawat, et al. ∙

research

∙ 07/18/2018

The Generalized Lasso for Sub-gaussian Measurements with Dithered Quantization

In the problem of structured signal recovery from high-dimensional linea...

0 Christos Thrampoulidis, et al. ∙

research

∙ 05/22/2018

Robust Gradient Descent via Moment Encoding with LDPC Codes

This paper considers the problem of implementing large-scale gradient de...

0 Raj Kumar Maity, et al. ∙

research

∙ 03/12/2018

Representation Learning and Recovery in the ReLU Model

Rectified linear units, or ReLUs, have become the preferred activation f...

0 Arya Mazumdar, et al. ∙

research

∙ 12/11/2017

The PhaseLift for Non-quadratic Gaussian Measurements

We study the problem of recovering a structured signal x_0 from high-dim...

0 Christos Thrampoulidis, et al. ∙

research

∙ 09/24/2017

MDS Code Constructions with Small Sub-packetization and Near-optimal Repair Bandwidth

This paper addresses the problem of constructing MDS codes that enable e...

0 Ankit Singh Rawat, et al. ∙

Ankit Singh Rawat

Featured Co-authors

Sign in with Google

Consider DeepAI Pro