Sublinear Algorithms for Statistical, Markov Chain, and Binpacking Problems

Other Titles
Abstract
We consider the problem of how to construct algorithms which deal efficiently with large amounts of data. We give new algorithms which use time and communication resources that are sublinear in the problem size for problems in various domains including statistics and combinatorics. We first consider properties of random variables. We begin with the problem of distinguishing whether two distributions over the same domain are close or far in both the $L_1$ and the $L_2$ norms. We investigate two models for representing a distribution. In one model, elements of a sample space are generated on request according to a fixed but unknown distribution. In the other, the probability assigned to each element is given explicitly in an array. We present algorithms in two settings: (1) when both distributions are represented in the first model; and, (2) when one of each representation is given. We show that the first setting is provably easier than the second setting. Next we give algorithms for testing whether two random variables are independent. In all of our algorithms, the number of samples required from the input distributions is sublinear in the domain size and nearly optimal. We then consider properties of data. Specifically, we give an algorithm which determines if a Markov Chain is rapidly mixing in sublinear time, assuming the input is in a form which allows for easy generation of sequential nodes in a random walk. Our test distinguishes Markov chains which are rapidly mixing from those which cannot be made rapidly mixing by changing a small number of edges. Finally we turn to a model in which the help of an untrusted entity is used to reliably solve a problem in sublinear time. For the problem of multidimensional bin-packing, we give an algorithm which can verify the goodness of a potential solution in sublinear time. To do this we develop tools which allow one to test that a function is approximately monotone.
Journal / Series
Volume & Issue
Description
96 pages
Sponsorship
Date Issued
2019-12
Publisher
Keywords
Binpacking; Distribution Testing; Markov Chain; Property Testing; Sublinear Algorithms
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Tardos, Éva
Committee Co-Chair
Committee Member
Shmoys, David B.
Selman, Bart
Rubinfeld, Ronitt
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
References
Link(s) to Reference(s)
Previously Published As
Government Document
ISBN
ISMN
ISSN
Other Identifiers
Rights
Rights URI
Types
dissertation or thesis
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record