Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
DigitalCollections@ILR
ILR School
  1. Home
  2. ILR School
  3. Centers, Institutes, Programs
  4. Labor Dynamics Institute
  5. NSF Census Research Network
  6. Cornell University NCRN node
  7. Fast Near Neighbor Search in High-Dimensional Binary Data

Fast Near Neighbor Search in High-Dimensional Binary Data

File(s)
ShrivastavaPing2012.pdf (1.49 MB)
Main article as presented at ECML 2012
Permanent Link(s)
https://hdl.handle.net/1813/37987
Collections
Cornell University NCRN node
Author
Shrivastava, Anshumali
Li, Ping
Abstract

Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating hash tables directly using the bits generated by b-bit minwise hashing. The advantages of our method are demonstrated through thorough comparisons with two strong baselines: spectral hashing and sign (1-bit) random projections.

Sponsorship
NSF Grant #1131848
Date Issued
2013-10-23
Publisher
Internetware 2013
Keywords
Distributed Mining
•
Parallel Mining
•
Big Data
Related Version
Published in 2012 on proceedings website
Previously Published as
Anshumali Shrivastava and Ping Li. Fast Near Neighbor Search in High-Dimensional Binary Data. Proceedings of The European Conference on Machine Learning (ECML 2012)
Type
preprint

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance