The Kolmogorov Superposition Theorem can Break the Curse of Dimensionality When Approximating High Dimensional Functions
We explain how to use Kolmogorov's Superposition Theorem (KST) to overcome the curse of dimensionality in approximating multi-dimensional functions and learning multi-dimensional data sets by using neural networks of two layers. That is, there is a class of functions called K-Lipschitz continuous in the sense that the K-outer function g of f is Lipschitz continuous can be approximated by a ReLU network of two layers with dn, n widths to have an approximation order O(d^2/n). In addition, we show that polynomials of high degree can be expressed by using neural networks with activation function σ_ℓ(t)=(t_+)^ℓ with ℓ≥ 2 with multiple layers and appropriate widths. More layers of neural networks, the higher degree polynomials can be reproduced. Hence, the deep learning algorithm can well approximate multi-dimensional data when the number of layers increases with high degree activation function σ_ℓ. Finally, we present a mathematical justification for image classification by using a deep learning algorithm.
READ FULL TEXT