b-Bit Minwise Hashing in Practice
Li, Ping; Shrivastava, Anshumali; König, Arnd Christian
Minwise hashing is a standard technique in the context of search for approximating set similarities. The recent work [26, 32] demon- strated a potential use of b-bit minwise hashing [23, 24] for ef- ficient search and learning on massive, high-dimensional, binary data (which are typical for many applications in Web search and text mining). In this paper, we focus on a number of critical is- sues which must be addressed before one can apply b-bit minwise hashing to the volumes of data often used industrial applications.
NSF Grant #1131848.
Fifth Asia-Pacific Symposium on Internetware
Previously Published As
Ping Li, Anshumali Shrivastava and Arnd Christian König. b-Bit Minwise Hashing in Practice. Internetware 2013. October 2013.