Can SGD Learn Recurrent Neural Networks with Provable Generalization?

02/04/2019
by   Zeyuan Allen-Zhu, et al.
0

Recurrent Neural Networks (RNNs) are among the most popular models in sequential data analysis. However, due to the complexity raised by recurrent structure, they remain one of the least theoretically understood neural-network models. In particular, existing generalization bounds for RNNs mostly scale exponentially with the length of the input sequence, which limited their practical implications. In this paper, we show that if the input labels are (approximately) realizable by certain classes of (non-linear) functions of the input sequences, then using the vanilla stochastic gradient descent (SGD), RNNs can actually learn them efficiently, meaning that both the training time and sample complexity only scale polynomially with the input length (or almost polynomially, depending on the classes of non-linearity).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset