A Data Dependent Algorithm for Querying Earth Mover's Distance with Low Doubling Dimension
In this paper, we consider the following query problem: given two weighted point sets A and B in the Euclidean space R^d, we want to quickly determine that whether their earth mover's distance (EMD) is larger or smaller than a pre-specified threshold T≥ 0. The problem finds a number of important applications in the fields of machine learning and data mining. In particular, we assume that the dimensionality d is not fixed and the sizes |A| and |B| are large. Therefore, most of existing EMD algorithms are not quite efficient to solve this problem. Here, we consider the problem under the assumption that A and B have low doubling dimension, which is common for high-dimensional data in real world. Inspired by the geometric method net tree, we propose a novel "data-dependent" algorithm to solve this problem efficiently. We also study the performance of our method on real datasets, and the experimental results suggest that our method can save a large amount of running time comparing with existing EMD algorithms.
READ FULL TEXT