eCommons

 

Statistical Inference under Information Constraints

Other Titles

Author(s)

Abstract

Data from user devices form the backbone of modern learning systems. Due to the distributed nature of the devices and the enormous size of the data, the access to the data is often limited by various information constraints such as privacy constraints and bandwidth limits. This thesis contributes to tackling crucial challenges in statistical inference under information constraints, in both designing efficient algorithms in the constrained settings, and characterizing the information-theoretic limits. We list some major contributions below. The role of interactivity in constrained inference. Despite a large literature on information-constrained inference, the role of interactivity is still inadequately understood. We develop a unified "plug-and-play" framework to prove information-theoretic limits in interactive inference. The framework works under minimal regularity conditions, which enables us to close several gaps in the literature. We provide the first tight bounds for discrete distribution estimation under communication and local differential privacy (LDP) constraints, showing that interactivity doesn't help in these settings. On the other hand, for more structured tasks such as sparse high-dimensional mean estimation, we show that interactivity allows one to obtain provably more sample-efficient algorithms. Reducing communication and computation cost in private estimation. For k-ary discrete distribution estimation under \eps-LDP constraints, with n users, all previously known sample optimal algorithms require linear (in k) communication from each user in the high privacy regime (\eps= O(1)), and has time complexity \Theta (n*k). We propose sample-optimal schemes with communication cost of at most \tilde{O}(1) bits per user, and nearly linear running time of \tilde{O}(n+k). We further study the communication complexity of private estimation and demonstrate a separation between symmetric and asymmetric schemes. Inference with multiple sample per user. While more samples provide more information, it is not clear how to utilize it under stringent information constraints. For discrete distribution estimation under l-bit communication constraints, we provide novel statistically optimal algorithms which demonstrate that having m samples can improve the l1 risk by 1/sqrt{m} for small m. Our result also provides an interesting phase transition when m increases. The dependence on l decays as \Theta(1/2^l) when m is small and the rate becomes \Theta(1/l) for large m. Manipulation attack in statistical inference. Distributed systems are also subject to faculty or adversarial behaviors from local agents. We study discrete distribution testing and estimation in the strong contamination model, where an adversary can change \gamma-fraction of users' samples arbitrarily. We show that for identity testing of discrete distributions, the error caused by adversarial attack can be polynomially higher than \Theta(\gamma), which is optimal for various estimation tasks. We also show that for distribution estimation, information constraints such as communication constraints can lead to a loss polynomially higher than the \Theta(\gamma) rate in the unconstrained setting.

Journal / Series

Volume & Issue

Description

302 pages

Sponsorship

Date Issued

2022-05

Publisher

Keywords

Distributed learning; Information theory; Private data analysis; Statistical inference

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Acharya, Jayadev

Committee Co-Chair

Committee Member

Weinberger, Kilian Quirin
Sridharan, Karthik
Wagner, Aaron B.

Degree Discipline

Electrical and Computer Engineering

Degree Name

Ph. D., Electrical and Computer Engineering

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record