Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Statistical Inference under Information Constraints

Statistical Inference under Information Constraints

File(s)
Sun_cornellgrad_0058F_13088.pdf (2.23 MB)
Permanent Link(s)
https://doi.org/10.7298/jsh4-hd04
https://hdl.handle.net/1813/111796
Collections
Cornell Theses and Dissertations
Author
Sun, Ziteng
Abstract

Data from user devices form the backbone of modern learning systems. Due to the distributed nature of the devices and the enormous size of the data, the access to the data is often limited by various information constraints such as privacy constraints and bandwidth limits. This thesis contributes to tackling crucial challenges in statistical inference under information constraints, in both designing efficient algorithms in the constrained settings, and characterizing the information-theoretic limits. We list some major contributions below. The role of interactivity in constrained inference. Despite a large literature on information-constrained inference, the role of interactivity is still inadequately understood. We develop a unified "plug-and-play" framework to prove information-theoretic limits in interactive inference. The framework works under minimal regularity conditions, which enables us to close several gaps in the literature. We provide the first tight bounds for discrete distribution estimation under communication and local differential privacy (LDP) constraints, showing that interactivity doesn't help in these settings. On the other hand, for more structured tasks such as sparse high-dimensional mean estimation, we show that interactivity allows one to obtain provably more sample-efficient algorithms. Reducing communication and computation cost in private estimation. For k-ary discrete distribution estimation under \eps-LDP constraints, with n users, all previously known sample optimal algorithms require linear (in k) communication from each user in the high privacy regime (\eps= O(1)), and has time complexity \Theta (n*k). We propose sample-optimal schemes with communication cost of at most \tilde{O}(1) bits per user, and nearly linear running time of \tilde{O}(n+k). We further study the communication complexity of private estimation and demonstrate a separation between symmetric and asymmetric schemes. Inference with multiple sample per user. While more samples provide more information, it is not clear how to utilize it under stringent information constraints. For discrete distribution estimation under l-bit communication constraints, we provide novel statistically optimal algorithms which demonstrate that having m samples can improve the l1 risk by 1/sqrt{m} for small m. Our result also provides an interesting phase transition when m increases. The dependence on l decays as \Theta(1/2^l) when m is small and the rate becomes \Theta(1/l) for large m. Manipulation attack in statistical inference. Distributed systems are also subject to faculty or adversarial behaviors from local agents. We study discrete distribution testing and estimation in the strong contamination model, where an adversary can change \gamma-fraction of users' samples arbitrarily. We show that for identity testing of discrete distributions, the error caused by adversarial attack can be polynomially higher than \Theta(\gamma), which is optimal for various estimation tasks. We also show that for distribution estimation, information constraints such as communication constraints can lead to a loss polynomially higher than the \Theta(\gamma) rate in the unconstrained setting.

Description
302 pages
Date Issued
2022-05
Keywords
Distributed learning
•
Information theory
•
Private data analysis
•
Statistical inference
Committee Chair
Acharya, Jayadev
Committee Member
Weinberger, Kilian Quirin
Sridharan, Karthik
Wagner, Aaron B.
Degree Discipline
Electrical and Computer Engineering
Degree Name
Ph. D., Electrical and Computer Engineering
Degree Level
Doctor of Philosophy
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/15529887

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance