Fast Near Neighbor Search in High-Dimensional Binary Data
File(s)ShrivastavaPing2012.pdf (1.49 MB)
Main article as presented at ECML 2012
Permanent Link(s)
Collections
Author
Shrivastava, Anshumali
Li, Ping
Abstract
Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating hash tables directly using the bits generated by b-bit minwise hashing. The advantages of our method are demonstrated through thorough comparisons with two strong baselines: spectral hashing and sign (1-bit) random projections.
Sponsorship
NSF Grant #1131848
Date Issued
2013-10-23
Publisher
Internetware 2013
Keywords
Related Version
Published in 2012 on proceedings website
Previously Published as
Anshumali Shrivastava and Ping Li. Fast Near Neighbor Search in High-Dimensional Binary Data. Proceedings of The European Conference on Machine Learning (ECML 2012)
Type
preprint
