Meta-Principled Family of Hyperparameter Scaling Strategies

10/10/2022
by   Sho Yaida, et al.
0

In this note, we first derive a one-parameter family of hyperparameter scaling strategies that interpolates between the neural-tangent scaling and mean-field/maximal-update scaling. We then calculate the scalings of dynamical observables – network outputs, neural tangent kernels, and differentials of neural tangent kernels – for wide and deep neural networks. These calculations in turn reveal a proper way to scale depth with width such that resultant large-scale models maintain their representation-learning ability. Finally, we observe that various infinite-width limits examined in the literature correspond to the distinct corners of the interconnected web spanned by effective theories for finite-width neural networks, with their training dynamics ranging from being weakly-coupled to being strongly-coupled.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2020

Dynamically Stable Infinite-Width Limits of Neural Classifiers

Recent research has been focused on two different approaches to studying...
research
11/22/2022

Learning Deep Neural Networks by Iterative Linearisation

The excellent real-world performance of deep neural networks has receive...
research
08/19/2020

Asymptotics of Wide Convolutional Neural Networks

Wide neural networks have proven to be a rich class of architectures for...
research
03/30/2023

Neural signature kernels as infinite-width-depth-limits of controlled ResNets

Motivated by the paradigm of reservoir computing, we consider randomly i...
research
03/12/2020

Towards a General Theory of Infinite-Width Limits of Neural Classifiers

Obtaining theoretical guarantees for neural networks training appears to...
research
12/10/2021

Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks

Understanding the fundamental principles behind the massive success of n...
research
05/13/2023

Depth Dependence of μP Learning Rates in ReLU MLPs

In this short note we consider random fully connected ReLU networks of w...

Please sign up or login with your details

Forgot password? Click here to reset