JavaScript is disabled for your browser. Some features of this site may not work without it.
Probabilistic Hashing Techniques For Big Data

Author
Shrivastava, Anshumali
Abstract
We investigate probabilistic hashing techniques for addressing computational and memory challenges in large scale machine learning and data mining systems. In this thesis, we show that the traditional idea of hashing goes far beyond near-neighbor search and there are some striking new possibilities. We show that hashing can improve state of the art large scale learning algorithms, and it goes beyond the conventional notions of pairwise similarities. Despite being a very well studied topic in literature, we found several opportunities for fundamentally improving some of the well know textbook hashing algorithms. In particular, we show that the traditional way of computing minwise hashes is unnecessarily expensive and without loosing anything we can achieve an order of magnitude speedup. We also found that for cosine similarity search there is a better scheme than SimHash. In the end, we show that the existing locality sensitive hashing framework itself is very restrictive, and we cannot have efficient algorithms for some important measures like inner products which are ubiquitous in machine learning. We propose asymmetric locality sensitive hashing (ALSH), an extended framework, where we show provable and practical efficient algorithms for Maximum Inner Product Search (MIPS). Having such an efficient solutions to MIPS directly scales up many popular machine learning algorithms. We believe that this thesis provides significant improvements to some of the heavily used subroutines in big-data systems, which we hope will be adopted.
Date Issued
2015-08-17Subject
Large Scale Machine Learning; Randomized Algorithms for Big-Data; Hashing, Sketching
Committee Chair
Li,Ping
Committee Member
Gehrke,Johannes E.; Samorodnitsky,Gennady
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Type
dissertation or thesis