Qingru Zhang
ML Ph.D. @ Georgia Tech
Transformer models have achieved remarkable results in various natural
l...
Fine-tuning large pre-trained language models on downstream tasks has be...
Layer-wise distillation is a powerful tool to compress large models (i.e...
Large Transformer-based models have exhibited superior performance in va...
Pre-trained language models have demonstrated superior performance in va...
Graph neural networks (GNN) have recently emerged as a vehicle for apply...
Adam is shown not being able to converge to the optimal solution in cert...