Learning Two layer Networks with Multinomial Activation and High Thresholds
Giving provable guarantees for learning neural networks is a core challenge of machine learning theory. Most prior work gives parameter recovery guarantees for one hidden layer networks. In this work we study a two layer network where the top node instead of a sum (one layer) is a well-behaved multivariate polynomial in all its inputs. We show that if the thresholds (biases) of the first layer neurons are higher than Ω(√( d)) for d being the input dimension, then the weights are learnable under the gaussian input. Furthermore even for lower thresholds, we can learn the lowest layer using polynomial sample complexity although exponential time. As an application of our results, we give a polynomial time algorithm for learning an intersection of halfspaces that are Ω(√( d)) far from the origin for gaussian input distribution. Finally for deep networks with depth larger than two, assuming the layers two onwards can be expressed as a polynomial by simply using the taylor series, we can learn the lowest layer under the conditions required by our assumptions.
READ FULL TEXT