dc.contributor.author Vakili, Sattar dc.date.accessioned 2018-04-26T14:16:04Z dc.date.available 2018-04-26T14:16:04Z dc.date.issued 2017-08-30 dc.identifier.other Vakili_cornellgrad_0058F_10481 dc.identifier.other http://dissertations.umi.com/cornellgrad:10481 dc.identifier.other bibid: 10361449 dc.identifier.uri https://hdl.handle.net/1813/56770 dc.description.abstract This dissertation focuses on sequential learning and inference under unknown models. In this class of problems, observations are available one at a time, and optimal actions cannot be determined \emph{a priori}. Solutions to such problems require optimization over time: to learn optimally from the past and then act accordingly with foresighted planning. The first part of the dissertation is on sequential learning within the framework of multi-armed bandit (MAB) theory. Originated in 1930s, the MAB problems have been traditionally studied under the following modeling assumptions: the reward distributions are assumed to have a bounded support, the learning is centralized with a single player, the reward distributions are assumed to be time-invariant, and the performance measure mainly targets at maximizing the expected return of a sequential learning policy. This dissertation aims to relax these modeling assumptions and develop parallel results applicable to a wider range of applications, in particular, emerging applications such as online ad placement and web search as well as social and economic networks. The main results in the first part include a new policy based on a deterministic sequencing of exploration and exploitation for the classic MAB. The structure of the proposed policy allows easy extensions to variations of MAB, including decentralized MAB with multiple players and incomplete reward observations under collisions. The proposed policy achieves the optimal logarithmic \emph{regret} order under general reward distribution models such as Sub-Gaussian and heavy-tailed distributions. The time-variation in the reward process is also addressed by considering an arbitrary time-variation as well as a piece-wise stationary model. A lower bound on the regret order is obtained and order optimality of a sequential learning policy is shown. The issue of risk in multi-armed bandit problems is considered and parallel results are developed under the measure of mean-variance, a commonly adopted risk measure in economics and mathematical finance. It is shown that the model-specific regret and the model-independent regret in terms of the mean-variance of the reward process are lower bounded by $\Omega(\log T)$ and $\Omega(T^{\frac{2}{3}})$, respectively. It is also shown that variations of the policies developed for the classic risk-neutral MAB achieve these lower bounds. Also, under the measure of value at risk (another common risk measure in financial mathematics) a sequential learning policy is developed. In addition, a minimal side information on the reward model is introduced that can lead to bounded regret, thus, complete learning. Provided the side information, a sequential learning policy with bounded regret is proposed. The second part of the dissertation is on sequential inference which can be categorized into three classes of problems: sequential hypothesis testing, change detection, and active hypothesis testing. These classic problems date back to 1940s with pioneering works by Wald, Chernoff, Page, and Shewhart. The focus of this dissertation is on time-varying models, hierarchical observations, and non-parametric composite hypotheses motivated by recent engineering applications such as short-term instability detection from PMU measurements in power systems and detection of heavy hitters and denial-of-service attacks in the Internet, and communication and computer networks. The main results in the second part include asymptotically optimal tests for the sequential hypothesis testing and change-point detection problems under a time-varying distribution model suitable for instability detection applications. The asymptotic optimality of the tests are proven by establishing fundamental limits on the Bayesian cost of any test. %We address the sequential hypothesis testing and change-point detection problems under a time-varying distribution model suitable for instability detection applications. The problem of active inference under hierarchical observations, and unknown and non-parametric models is also considered. A policy that interactively chooses the observation points is proposed and is shown to achieve optimal logarithmic order sample complexity in both the problem size and the reliability constraint. dc.language.iso en_US dc.subject Multi-Armed Bandit dc.subject Risk-aversion dc.subject Sequential inference dc.subject Sequential learning dc.subject Time-varying models dc.subject Unknown models dc.subject Computer science dc.subject Electrical engineering dc.subject Operations research dc.title SEQUENTIAL METHODS FOR LEARNING AND INFERENCE UNDER UNKNOWN MODELS dc.type dissertation or thesis thesis.degree.discipline Electrical and Computer Engineering thesis.degree.grantor Cornell University thesis.degree.level Doctor of Philosophy thesis.degree.name Ph. D., Electrical and Computer Engineering dc.contributor.chair Zhao, Qing dc.contributor.committeeMember Bitar, Eilyan Yamen dc.contributor.committeeMember Wagner, Aaron B. dcterms.license https://hdl.handle.net/1813/59810 dc.identifier.doi https://doi.org/10.7298/X4J38QPZ
﻿