Stochastic optimization approaches to learning concise representations
We propose and study a method for learning interpretable features via stochastic optimization of feature architectures. Features are represented as multi-type expression trees using a set of activation functions common in neural networks in addition to other elementary functions. Continuous features are trained via gradient descent, and the performance of features in ML models is used to weight the rate of change among subcomponents of representations. The search process maintains an archive of representations with accuracy-complexity trade-offs to assist in generalization and interpretation. We compare several stochastic optimization approaches within this framework. We benchmark these variants on many real-world regression problems in comparison to other machine learning approaches. The best results across methods are obtained by search methods that directly optimize the accuracy-complexity trade-off in order to find simple architectures that generalize well.
READ FULL TEXT