Sublinear Algorithms for Statistical, Markov Chain, and Binpacking Problems
White, Patrick Edward
We consider the problem of how to construct algorithms which deal efficiently with large amounts of data. We give new algorithms which use time and communication resources that are sublinear in the problem size for problems in various domains including statistics and combinatorics. We first consider properties of random variables. We begin with the problem of distinguishing whether two distributions over the same domain are close or far in both the $L_1$ and the $L_2$ norms. We investigate two models for representing a distribution. In one model, elements of a sample space are generated on request according to a fixed but unknown distribution. In the other, the probability assigned to each element is given explicitly in an array. We present algorithms in two settings: (1) when both distributions are represented in the first model; and, (2) when one of each representation is given. We show that the first setting is provably easier than the second setting. Next we give algorithms for testing whether two random variables are independent. In all of our algorithms, the number of samples required from the input distributions is sublinear in the domain size and nearly optimal. We then consider properties of data. Specifically, we give an algorithm which determines if a Markov Chain is rapidly mixing in sublinear time, assuming the input is in a form which allows for easy generation of sequential nodes in a random walk. Our test distinguishes Markov chains which are rapidly mixing from those which cannot be made rapidly mixing by changing a small number of edges. Finally we turn to a model in which the help of an untrusted entity is used to reliably solve a problem in sublinear time. For the problem of multidimensional bin-packing, we give an algorithm which can verify the goodness of a potential solution in sublinear time. To do this we develop tools which allow one to test that a function is approximately monotone.
Binpacking; Distribution Testing; Markov Chain; Property Testing; Sublinear Algorithms
Shmoys, David B.; Selman, Bart; Rubinfeld, Ronitt
Ph. D., Computer Science
Doctor of Philosophy
dissertation or thesis