A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

12/22/2016
by   Deng Cai, et al.
0

Approximate Nearest Neighbor (ANN) search is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims outperforms other state-of-the-art methods. However, there are serious drawbacks in the evaluation of existing hashing papers and most of the claims in these papers should be re-examined. 1) Most of the existing papers failed to correctly measure the search time which is essential for the ANN search problem. 2) As a result, most of the papers report the performance increases as the code length increases, which is wrong if we measure the search time correctly. 3) The performance of some hashing algorithms (e.g., LSH) can easily be boosted if one uses multiple hash tables, which is an important factor should be considered in the evaluation while most of the papers failed to do so. In this paper, we carefully revisit many popular hashing algorithms and suggest one possible promising direction. For the sake of reproducibility, all the codes used in the paper are released on Github, which can be used as a testing platform to fairly compare various hashing algorithms.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset