b'Qingru Zhang'

Chat Image Generator Video Music Voice Chat Photo Editor

Featured Co-authors

Weinan Zhang
146 publications
Yu Cheng
139 publications
Le Song
105 publications
Yong Yu
102 publications
Tuo Zhao
94 publications
Weizhu Chen
89 publications
Chen Liang
58 publications
Pengcheng He
53 publications
Hongwei Wang
40 publications
David Wipf
32 publications
Minshuo Chen
31 publications

research

∙ 06/20/2023

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Transformer models have achieved remarkable results in various natural l...

0 Yixiao Li, et al. ∙

research

∙ 03/18/2023

Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Fine-tuning large pre-trained language models on downstream tasks has be...

0 Qingru Zhang, et al. ∙

research

∙ 10/04/2022

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

Layer-wise distillation is a powerful tool to compress large models (i.e...

0 Chen Liang, et al. ∙

research

∙ 06/25/2022

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

Large Transformer-based models have exhibited superior performance in va...

0 Qingru Zhang, et al. ∙

research

∙ 04/15/2022

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation

Pre-trained language models have demonstrated superior performance in va...

0 Simiao Zuo, et al. ∙

research

∙ 03/01/2021

A Biased Graph Neural Network Sampler with Near-Optimal Regret

Graph neural networks (GNN) have recently emerged as a vehicle for apply...

0 Qingru Zhang, et al. ∙

research

∙ 09/29/2018

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods

Adam is shown not being able to converge to the optimal solution in cert...

0 Zhiming Zhou, et al. ∙

Success!

An error occurred

Qingru Zhang

Featured Co-authors

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation

A Biased Graph Neural Network Sampler with Near-Optimal Regret

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods

Sign in with Google

Consider DeepAI Pro