b'Denny Zhou'

research

∙ 09/07/2023

Large Language Models as Optimizers

Optimization is ubiquitous. While derivative-based algorithms have been ...

0 Chengrun Yang, et al. ∙

research

∙ 08/07/2023

Simple synthetic data reduces sycophancy in large language models

Sycophancy is an undesirable behavior where models tailor their response...

0 Jerry Wei, et al. ∙

research

∙ 05/26/2023

Large Language Models as Tool Makers

Recent research shows the potential of enhancing the problem-solving abi...

7 Tianle Cai, et al. ∙

research

∙ 05/26/2023

Training Socially Aligned Language Models in Simulated Human Society

Social alignment in AI systems aims to ensure that these models behave a...

7 Ruibo Liu, et al. ∙

research

∙ 05/24/2023

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts

The explosive growth of language models and their applications have led ...

0 Sheng Shen, et al. ∙

research

∙ 05/22/2023

A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, Toxicity

Pretraining is the preliminary and fundamental step in developing capabl...

0 Shayne Longpre, et al. ∙

research

∙ 05/19/2023

Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization

In this paper, we aim to optimize a contrastive loss with individualized...

0 Zi-Hao Qiu, et al. ∙

research

∙ 05/15/2023

Symbol tuning improves in-context learning in language models

We present symbol tuning - finetuning language models on in-context inpu...

0 Jerry Wei, et al. ∙

research

∙ 04/11/2023

Teaching Large Language Models to Self-Debug

Large language models (LLMs) have achieved impressive performance on cod...

0 Xinyun Chen, et al. ∙

research

∙ 03/07/2023

Larger language models do in-context learning differently

We study how in-context learning (ICL) in language models is affected by...

0 Jerry Wei, et al. ∙

research

∙ 01/31/2023

Large Language Models Can Be Easily Distracted by Irrelevant Context

Large language models have achieved impressive performance on various na...

7 Freda Shi, et al. ∙

research

∙ 01/31/2023

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

We study the design decisions of publicly available instruction tuning m...

0 Shayne Longpre, et al. ∙

research

∙ 11/28/2022

What learning algorithm is in-context learning? Investigations with linear models

Neural sequence models, especially transformers, exhibit a remarkable ca...

0 Ekin Akyürek, et al. ∙

research

∙ 11/21/2022

TEMPERA: Test-Time Prompting via Reinforcement Learning

Careful prompt design is critical to the use of large language models in...

0 Tianjun Zhang, et al. ∙

research

∙ 10/20/2022

Transcending Scaling Laws with 0.1

Scaling language models improves performance but comes with significant ...

2 Yi Tay, et al. ∙

research

∙ 10/11/2022

Mind's Eye: Grounded Language Model Reasoning through Simulation

Successful and effective communication between humans and AI relies on a...

8 Ruibo Liu, et al. ∙

research

∙ 10/06/2022

Language Models are Multilingual Chain-of-Thought Reasoners

We evaluate the reasoning abilities of large language models in multilin...

0 Freda Shi, et al. ∙

research

∙ 10/04/2022

Recitation-Augmented Language Models

We propose a new paradigm to help Large Language Models (LLMs) generate ...

0 Zhiqing Sun, et al. ∙

research

∙ 09/29/2022

Compositional Semantic Parsing with Large Language Models

Humans can reason compositionally when presented with new tasks. Previou...

0 Andrew Drozdov, et al. ∙

research

∙ 07/02/2022

Rationale-Augmented Ensembles in Language Models

Recent research has shown that rationales, or step-by-step chains of tho...

1 Xuezhi Wang, et al. ∙

research

∙ 05/21/2022

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

We propose a novel prompting strategy, least-to-most prompting, that ena...

1 Denny Zhou, et al. ∙

research

∙ 04/05/2022

PaLM: Scaling Language Modeling with Pathways

Large language models have been shown to achieve remarkable performance ...

6 Aakanksha Chowdhery, et al. ∙

research

∙ 03/24/2022

Token Dropping for Efficient BERT Pretraining

Transformer-based models generally allocate the same amount of computati...

0 Le Hou, et al. ∙

research

∙ 03/21/2022

Self-Consistency Improves Chain of Thought Reasoning in Language Models

We explore a simple ensemble strategy, self-consistency, that significan...

0 Xuezhi Wang, et al. ∙

research

∙ 02/24/2022

Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance

In this paper, we study contrastive learning from an optimization perspe...

0 Zhuoning Yuan, et al. ∙

research

∙ 02/24/2022

Auto-scaling Vision Transformers without Training

This work targets automated designing and scaling of Vision Transformers...

8 Wuyang Chen, et al. ∙

research

∙ 01/28/2022

Chain of Thought Prompting Elicits Reasoning in Large Language Models

Although scaling up language model size has reliably improved performanc...

9 Jason Wei, et al. ∙

research

∙ 12/17/2021

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation

This work presents a simple vision transformer design as a strong baseli...

0 Wuyang Chen, et al. ∙

research

∙ 10/28/2021

SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs

Knowledge graphs (KGs) capture knowledge in the form of head–relation–ta...

6 Hongyu Ren, et al. ∙

research

∙ 10/08/2021

Speeding up Deep Model Training by Sharing Weights and Then Unsharing

We propose a simple and efficient approach for training the BERT model. ...

7 Shuo Yang, et al. ∙

research

∙ 06/26/2021

SpreadsheetCoder: Formula Prediction from Semi-structured Context

Spreadsheet formula prediction has been an important program synthesis p...

1 Xinyun Chen, et al. ∙

research

∙ 08/15/2020

Compositional Generalization via Neural-Symbolic Stack Machines

Despite achieving tremendous success, existing deep learning models have...

6 Xinyun Chen, et al. ∙

research

∙ 04/06/2020

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

Natural Language Processing (NLP) has recently achieved great success by...

0 Zhiqing Sun, et al. ∙

research

∙ 03/24/2020

Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning

Off-policy estimation for long-horizon problems is important in many rea...

7 Ali Mousavi, et al. ∙

research

∙ 03/03/2020

Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

Recent empirical works show that large deep neural networks are often hi...

0 Mao Ye, et al. ∙

research

∙ 12/04/2019

Deep Physiological State Space Model for Clinical Forecasting

Clinical forecasting based on electronic medical records (EMR) can uncov...

0 Yuan Xue, et al. ∙

research

∙ 09/25/2019

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Pre-trained deep neural network language models such as ELMo, GPT, BERT ...

0 Sanqiang Zhao, et al. ∙

research

∙ 04/26/2019

Neural Logic Machines

We propose the Neural Logic Machine (NLM), a neural-symbolic architectur...

0 Honghua Dong, et al. ∙

research

∙ 01/30/2019

Doubly Sparse: Sparse Mixture of Sparse Experts for Efficient Softmax Inference

Computations for the softmax function are significantly expensive when t...

0 Shun Liao, et al. ∙

Denny Zhou

Featured Co-authors

Sign in with Google

Consider DeepAI Pro