Geometric k-nearest neighbor estimation of entropy and mutual information
Like most nonparametric estimators of information functionals involving continuous multidimensional random variables, the k-nearest neighbors (knn) estimators involve an estimate of the probability density functions (pdfs) of the variables. The pdfs are estimated using spheres in an appropriate norm to represent local volumes. We introduce a new class of knn estimators that we call geometric knn estimators (g-kNN), which use more complex local volume elements to better model the local geometry of the probability measures. As an example of this class of estimators, we develop a g-kNN estimator of entropy and mutual information based on elliptical volume elements, capturing the local stretching and compression common to a wide range of dynamical systems attractors. There is a trade-off between the amount of local data needed to fit a more complicated local volume element and the improvement in the estimate due to the better description of the local geometry. In a series of numerical examples, this g-kNN estimator of mutual information is compared to the Kraskov-Stögbauer-Grassberger (KSG) estimator, where we find that the modelling of the local geometry pays off in terms of better estimates, both when the joint distribution is thinly supported, and when sample sizes are small. In particular, the examples suggest that the g-kNN estimators can be of particular relevance to applications in which the system is large but data size is limited.
READ FULL TEXT