Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Recurrent Neural Networks (RNNs) are among the most popular models in sequential data analysis. However, due to the complexity raised by recurrent structure, they remain one of the least theoretically understood neural-network models. In particular, existing generalization bounds for RNNs mostly scale exponentially with the length of the input sequence, which limited their practical implications. In this paper, we show that if the input labels are (approximately) realizable by certain classes of (non-linear) functions of the input sequences, then using the vanilla stochastic gradient descent (SGD), RNNs can actually learn them efficiently, meaning that both the training time and sample complexity only scale polynomially with the input length (or almost polynomially, depending on the classes of non-linearity).
READ FULL TEXT