A common training technique for language models is teacher forcing (TF)....
Despite the rich existing literature about minimax optimization in conti...
Large, general purpose language models have demonstrated impressive
perf...
Evaluating clustering quality with reliable evaluation metrics like
norm...
Not all data in a typical training set help with generalization; some sa...
Traditional text classifiers are limited to predicting over a fixed set ...
Deep generative networks can simulate from a complex target distribution...
In many machine learning applications, it is important to explain the
pr...