b'Weizhu Chen'

research

∙ 09/06/2023

Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

Reward design is a fundamental, yet challenging aspect of practical rein...

0 Alexander Bukharin, et al. ∙

research

∙ 06/20/2023

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Transformer models have achieved remarkable results in various natural l...

0 Yixiao Li, et al. ∙

research

∙ 05/24/2023

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Large language models are powerful text processors and reasoners, but ar...

0 Zhihong Shao, et al. ∙

research

∙ 05/24/2023

GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions

Generalization to unseen tasks is an important ability for few-shot lear...

0 Woojeong Jin, et al. ∙

research

∙ 05/23/2023

Skill-Based Few-Shot Selection for In-Context Learning

In-Context learning is the paradigm that adapts large language models to...

0 Shengnan An, et al. ∙

research

∙ 05/16/2023

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

Diffusion models have gained significant attention in the realm of image...

1 Tong Wu, et al. ∙

research

∙ 05/08/2023

Code Execution with Pre-trained Language Models

Code execution is a fundamental aspect of programming language semantics...

0 Chenxiao Liu, et al. ∙

research

∙ 05/01/2023

In-Context Learning Unlocked for Diffusion Models

We present Prompt Diffusion, a framework for enabling in-context learnin...

0 Zhendong Wang, et al. ∙

research

∙ 04/25/2023

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

Diffusion models are powerful, but they require a lot of time and data t...

0 Zhendong Wang, et al. ∙

research

∙ 04/13/2023

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

Evaluating the general abilities of foundation models to tackle human-le...

0 Wanjun Zhong, et al. ∙

research

∙ 03/29/2023

AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators

Many natural language processing (NLP) tasks rely on labeled data to tra...

0 Xingwei He, et al. ∙

research

∙ 03/22/2023

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

The task of repository-level code completion is to continue writing the ...

0 Fengji Zhang, et al. ∙

research

∙ 03/18/2023

Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Fine-tuning large pre-trained language models on downstream tasks has be...

0 Qingru Zhang, et al. ∙

research

∙ 03/13/2023

Meet in the Middle: A New Pre-training Paradigm

Most language models (LMs) are trained and applied in an autoregressive ...

0 Anh Nguyen, et al. ∙

research

∙ 02/24/2023

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Large language models (LLMs), such as ChatGPT, are able to generate huma...

0 Baolin Peng, et al. ∙

research

∙ 02/01/2023

Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models

Large language models can perform various reasoning tasks by using chain...

0 Zhihong Shao, et al. ∙

research

∙ 12/22/2022

GENIE: Large Scale Pre-training for Text Generation with Diffusion Model

In this paper, we propose a large-scale language pre-training for text G...

1 Zhenghao Lin, et al. ∙

research

∙ 12/20/2022

Generation-Augmented Query Expansion For Code Retrieval

Pre-trained language models have achieved promising success in code retr...

0 Dong Li, et al. ∙

research

∙ 11/22/2022

HyperTuning: Toward Adapting Large Language Models without Back-propagation

Fine-tuning large language models for different tasks can be costly and ...

0 Jason Phang, et al. ∙

research

∙ 11/18/2022

GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation

We introduce GENIUS: a conditional text generation model using sketches ...

0 Biyang Guo, et al. ∙

research

∙ 10/21/2022

SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval

Sampling proper negatives from a large document pool is vital to effecti...

0 Kun Zhou, et al. ∙

research

∙ 10/18/2022

Soft-Labeled Contrastive Pre-training for Function-level Code Representation

Code contrastive pre-training has recently achieved significant progress...

0 Xiaonan Li, et al. ∙

research

∙ 10/04/2022

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

Layer-wise distillation is a powerful tool to compress large models (i.e...

0 Chen Liang, et al. ∙

research

∙ 07/21/2022

CodeT: Code Generation with Generated Tests

The task of generating code solutions for a given programming problem ca...

0 Bei Chen, et al. ∙

research

∙ 07/08/2022

OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

The information in tables can be an important complement to text, making...

0 Zhengbao Jiang, et al. ∙

research

∙ 06/28/2022

Joint Generator-Ranker Learning for Natural Language Generation

Due to exposure bias, most existing natural language generation (NLG) mo...

0 Weizhou Shen, et al. ∙

research

∙ 06/25/2022

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

Large Transformer-based models have exhibited superior performance in va...

0 Qingru Zhang, et al. ∙

research

∙ 06/14/2022

CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation

Code generation is a longstanding challenge, aiming to generate a code s...

0 Daoguang Zan, et al. ∙

research

∙ 06/06/2022

On the Advance of Making Language Models Better Reasoners

Large language models such as GPT-3 and PaLM have shown remarkable perfo...

0 Yifei Li, et al. ∙

research

∙ 06/05/2022

Diffusion-GAN: Training GANs with Diffusion

For stable training of generative adversarial networks (GANs), injecting...

0 Zhendong Wang, et al. ∙

research

∙ 05/23/2022

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

Non-Autoregressive generation is a sequence generation paradigm, which r...

0 Weizhen Qi, et al. ∙

research

∙ 05/10/2022

ALLSH: Active Learning Guided by Local Sensitivity and Hardness

Active learning, which effectively collects informative unlabeled data f...

0 Shujian Zhang, et al. ∙

research

∙ 04/27/2022

DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation

Dialog response generation in open domain is an important research topic...

0 Wei Chen, et al. ∙

research

∙ 04/15/2022

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation

Pre-trained language models have demonstrated superior performance in va...

0 Simiao Zuo, et al. ∙

research

∙ 04/13/2022

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

Model ensemble is a popular approach to produce a low-variance and well-...

0 Chen Liang, et al. ∙

research

∙ 03/07/2022

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Hyperparameter (HP) tuning in deep learning is an expensive process, pro...

2 Greg Yang, et al. ∙

research

∙ 03/07/2022

Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models

Recently the prompt-tuning paradigm has attracted significant attention....

0 Shengnan An, et al. ∙

research

∙ 02/27/2022

Controllable Natural Language Generation with Contrastive Prefixes

To guide the generation of large pretrained language models (LM), previo...

9 Jing Qian, et al. ∙

research

∙ 02/19/2022

Truncated Diffusion Probabilistic Models

Employing a forward Markov diffusion chain to gradually map the data to ...

0 Huangjie Zheng, et al. ∙

research

∙ 02/14/2022

Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs

Token-mixing multi-layer perceptron (MLP) models have shown competitive ...

0 Huangjie Zheng, et al. ∙

research

∙ 02/06/2022

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

Recent research has shown the existence of significant redundancy in lar...

0 Chen Liang, et al. ∙

research

∙ 01/27/2022

Reasoning Like Program Executors

Reasoning over natural language is a long-standing goal for the research...

0 Xinyu Pi, et al. ∙

research

∙ 01/26/2022

CodeRetriever: Unimodal and Bimodal Contrastive Learning

In this paper, we propose the CodeRetriever model, which combines the un...

3 Xiaonan Li, et al. ∙

research

∙ 12/06/2021

Contextual Bandit Applications in Customer Support Bot

Virtual support agents have grown in popularity as a way for businesses ...

0 Sandra Sajeev, et al. ∙

research

∙ 11/18/2021

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

This paper presents a new pre-trained language model, DeBERTaV3, which i...

0 Pengcheng He, et al. ∙

research

∙ 10/30/2021

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Gigantic pre-trained models have become central to natural language proc...

0 Xuxi Chen, et al. ∙

research

∙ 10/16/2021

A Good Prompt Is Worth Millions of Parameters? Low-resource Prompt-based Learning for Vision-Language Models

Large pretrained vision-language (VL) models can learn a new task with a...

0 Woojeong Jin, et al. ∙

research

∙ 10/07/2021

Adversarial Retriever-Ranker for dense text retrieval

Current dense text retrieval models face two typical challenges. First, ...

4 Hang Zhang, et al. ∙

research

∙ 09/26/2021

XLM-K: Improving Cross-Lingual Language Model Pre-Training with Multilingual Knowledge

Cross-lingual pre-training has achieved great successes using monolingua...

0 Xiaoze Jiang, et al. ∙

research

∙ 09/15/2021

ARCH: Efficient Adversarial Regularized Training with Caching

Adversarial regularization can improve model generalization in many natu...

0 Simiao Zuo, et al. ∙

Weizhu Chen

Featured Co-authors

Sign in with Google

Consider DeepAI Pro