Fast Near Neighbor Search in High-Dimensional Binary Data
Shrivastava, Anshumali; Li, Ping
Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating hash tables directly using the bits generated by b-bit minwise hashing. The advantages of our method are demonstrated through thorough comparisons with two strong baselines: spectral hashing and sign (1-bit) random projections.
Distributed Mining; Parallel Mining; Big Data
Published in 2012 on proceedings website
Previously Published As
Anshumali Shrivastava and Ping Li. Fast Near Neighbor Search in High-Dimensional Binary Data. Proceedings of The European Conference on Machine Learning (ECML 2012)