Gradient Descent Converges to Ridgelet Spectrum
Deep learning achieves a high generalization performance in practice, despite the non-convexity of the gradient descent learning problem. Recently, the inductive bias in deep learning has been studied through the characterization of local minima. In this study, we show that the distribution of parameters learned by gradient descent converges to a spectrum of the ridgelet transform based on a ridgelet analysis, which is a wavelet-like analysis developed for neural networks. This convergence is stronger than those shown in previous results, and guarantees the shape of the parameter distribution has been identified with the ridgelet spectrum. In numerical experiments with finite models, we visually confirm the resemblance between the distribution of learned parameters and the ridgelet spectrum. Our study provides a better understanding of the theoretical background of an inductive bias theory based on lazy regimes.
READ FULL TEXT 
  
  
     share
 share