Deep Learning Partial Least Squares
High dimensional data reduction techniques are provided by using partial least squares within deep learning. Our framework provides a nonlinear extension of PLS together with a disciplined approach to feature selection and architecture design in deep learning. This leads to a statistical interpretation of deep learning that is tailor made for predictive problems. We can use the tools of PLS, such as scree-plot, bi-plot to provide model diagnostics. Posterior predictive uncertainty is available using MCMC methods at the last layer. Thus we achieve the best of both worlds: scalability and fast predictive rule construction together with uncertainty quantification. Our key construct is to employ deep learning within PLS by predicting the output scores as a deep learner of the input scores. As with PLS our X-scores are constructed using SVD and applied to both regression and classification problems and are fast and scalable. Following Frank and Friedman 1993, we provide a Bayesian shrinkage interpretation of our nonlinear predictor. We introduce a variety of new partial least squares models: PLS-ReLU, PLS-Autoencoder, PLS-Trees and PLS-GP. To illustrate our methodology, we use simulated examples and the analysis of preferences of orange juice and predicting wine quality as a function of input characteristics. We also illustrate Brillinger's estimation procedure to provide the feature selection and data dimension reduction. Finally, we conclude with directions for future research.
READ FULL TEXT