Optimization is ubiquitous. While derivative-based algorithms have been
...
Sycophancy is an undesirable behavior where models tailor their response...
Recent research shows the potential of enhancing the problem-solving abi...
Social alignment in AI systems aims to ensure that these models behave
a...
The explosive growth of language models and their applications have led ...
Pretraining is the preliminary and fundamental step in developing capabl...
In this paper, we aim to optimize a contrastive loss with individualized...
We present symbol tuning - finetuning language models on in-context
inpu...
Large language models (LLMs) have achieved impressive performance on cod...
We study how in-context learning (ICL) in language models is affected by...
Large language models have achieved impressive performance on various na...
We study the design decisions of publicly available instruction tuning
m...
Neural sequence models, especially transformers, exhibit a remarkable
ca...
Careful prompt design is critical to the use of large language models in...
Scaling language models improves performance but comes with significant
...
Successful and effective communication between humans and AI relies on a...
We evaluate the reasoning abilities of large language models in multilin...
We propose a new paradigm to help Large Language Models (LLMs) generate ...
Humans can reason compositionally when presented with new tasks. Previou...
Recent research has shown that rationales, or step-by-step chains of tho...
We propose a novel prompting strategy, least-to-most prompting, that ena...
Large language models have been shown to achieve remarkable performance
...
Transformer-based models generally allocate the same amount of computati...
We explore a simple ensemble strategy, self-consistency, that significan...
In this paper, we study contrastive learning from an optimization
perspe...
This work targets automated designing and scaling of Vision Transformers...
Although scaling up language model size has reliably improved performanc...
This work presents a simple vision transformer design as a strong baseli...
Knowledge graphs (KGs) capture knowledge in the form of head–relation–ta...
We propose a simple and efficient approach for training the BERT model. ...
Spreadsheet formula prediction has been an important program synthesis
p...
Despite achieving tremendous success, existing deep learning models have...
Natural Language Processing (NLP) has recently achieved great success by...
Off-policy estimation for long-horizon problems is important in many
rea...
Recent empirical works show that large deep neural networks are often hi...
Clinical forecasting based on electronic medical records (EMR) can uncov...
Pre-trained deep neural network language models such as ELMo, GPT, BERT ...
We propose the Neural Logic Machine (NLM), a neural-symbolic architectur...
Computations for the softmax function are significantly expensive when t...