Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations
This paper investigates the approximation properties of deep neural networks with piecewise-polynomial activation functions. We derive the required depth, width, and sparsity of a deep neural network to approximate any Hölder smooth function up to a given approximation error in Hölder norms in such a way that all weights of this neural network are bounded by 1. The latter feature is essential to control generalization errors in many statistical and machine learning applications.
READ FULL TEXT