Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. SEQUENTIAL METHODS FOR LEARNING AND INFERENCE UNDER UNKNOWN MODELS

SEQUENTIAL METHODS FOR LEARNING AND INFERENCE UNDER UNKNOWN MODELS

File(s)
Vakili_cornellgrad_0058F_10481.pdf (1.19 MB)
Permanent Link(s)
https://doi.org/10.7298/X4J38QPZ
https://hdl.handle.net/1813/56770
Collections
Cornell Theses and Dissertations
Author
Vakili, Sattar
Abstract

This dissertation focuses on sequential learning and inference under unknown models. In this class of problems, observations are available one at a time, and optimal actions cannot be determined \emph{a priori}. Solutions to such problems require optimization over time: to learn optimally from the past and then act accordingly with foresighted planning. The first part of the dissertation is on sequential learning within the framework of multi-armed bandit (MAB) theory. Originated in 1930s, the MAB problems have been traditionally studied under the following modeling assumptions: the reward distributions are assumed to have a bounded support, the learning is centralized with a single player, the reward distributions are assumed to be time-invariant, and the performance measure mainly targets at maximizing the expected return of a sequential learning policy. This dissertation aims to relax these modeling assumptions and develop parallel results applicable to a wider range of applications, in particular, emerging applications such as online ad placement and web search as well as social and economic networks. The main results in the first part include a new policy based on a deterministic sequencing of exploration and exploitation for the classic MAB. The structure of the proposed policy allows easy extensions to variations of MAB, including decentralized MAB with multiple players and incomplete reward observations under collisions. The proposed policy achieves the optimal logarithmic \emph{regret} order under general reward distribution models such as Sub-Gaussian and heavy-tailed distributions. The time-variation in the reward process is also addressed by considering an arbitrary time-variation as well as a piece-wise stationary model. A lower bound on the regret order is obtained and order optimality of a sequential learning policy is shown. The issue of risk in multi-armed bandit problems is considered and parallel results are developed under the measure of mean-variance, a commonly adopted risk measure in economics and mathematical finance. It is shown that the model-specific regret and the model-independent regret in terms of the mean-variance of the reward process are lower bounded by $\Omega(\log T)$ and $\Omega(T^{\frac{2}{3}})$, respectively. It is also shown that variations of the policies developed for the classic risk-neutral MAB achieve these lower bounds. Also, under the measure of value at risk (another common risk measure in financial mathematics) a sequential learning policy is developed. In addition, a minimal side information on the reward model is introduced that can lead to bounded regret, thus, complete learning. Provided the side information, a sequential learning policy with bounded regret is proposed. The second part of the dissertation is on sequential inference which can be categorized into three classes of problems: sequential hypothesis testing, change detection, and active hypothesis testing. These classic problems date back to 1940s with pioneering works by Wald, Chernoff, Page, and Shewhart. The focus of this dissertation is on time-varying models, hierarchical observations, and non-parametric composite hypotheses motivated by recent engineering applications such as short-term instability detection from PMU measurements in power systems and detection of heavy hitters and denial-of-service attacks in the Internet, and communication and computer networks. The main results in the second part include asymptotically optimal tests for the sequential hypothesis testing and change-point detection problems under a time-varying distribution model suitable for instability detection applications. The asymptotic optimality of the tests are proven by establishing fundamental limits on the Bayesian cost of any test. %We address the sequential hypothesis testing and change-point detection problems under a time-varying distribution model suitable for instability detection applications. The problem of active inference under hierarchical observations, and unknown and non-parametric models is also considered. A policy that interactively chooses the observation points is proposed and is shown to achieve optimal logarithmic order sample complexity in both the problem size and the reliability constraint.

Date Issued
2017-08-30
Keywords
Multi-Armed Bandit
•
Risk-aversion
•
Sequential inference
•
Sequential learning
•
Time-varying models
•
Unknown models
•
Computer science
•
Electrical engineering
•
Operations research
Committee Chair
Zhao, Qing
Committee Member
Bitar, Eilyan Yamen
Wagner, Aaron B.
Degree Discipline
Electrical and Computer Engineering
Degree Name
Ph. D., Electrical and Computer Engineering
Degree Level
Doctor of Philosophy
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance