Pre-training has achieved remarkable success when transferred to downstr...
Popular reinforcement learning (RL) algorithms tend to produce a unimoda...
In this paper, we introduce the Layer-Peeled Model, a nonconvex yet
anal...
Gradient clipping is commonly used in training deep neural networks part...
We target the problem of finding a local minimum in non-convex finite-su...
In this paper, we prove that the simplest Stochastic Gradient Descent (S...
Zeroth-order optimization or derivative-free optimization is an importan...
We propose a new optimization method for training feed-forward neural
ne...
In this paper, we propose a new technique named Stochastic Path-Integrat...
Asynchronous algorithms have attracted much attention recently due to th...