Relevant feature extraction for statistical inference
We introduce an algorithm that learns correlations between two datasets, in a way which can be used to infer one type of data given the other. The approach allows for the computation of expectation values over the inferred conditional distributions, such as Bayesian estimators and their standard deviations. This is done by learning feature maps which span hyperplanes in the spaces of probabilities for both types of data; the relevant feature spaces. The loss function is chosen such that these spaces of reduced dimensionality tend to be optimal for performing inference, as measured by the χ^2-divergence. Some advantage of our algorithm over other approaches includes fast convergence and self-regularization, even when applied to simple supervised learning. We propose that, in addition to many applications where two correlated variables appear naturally, this approach could also be used to identify dominant independent features of a single dataset in an unsupervised fashion: in this scenario, the second variables should be produced from the original data by adding noise in a manner which defines an appropriate information metric.
READ FULL TEXT