eCommons

 

Risk and Safety in Online Learning and Optimization: Theory and Applications

Other Titles

Abstract

This dissertation focuses on risk and safety considerations in the design and analysis of online learning algorithms for sequential decision-making problems under uncertainty. The particular motivating application for the mathematical models and methods developed in this dissertation is demand response programs. Demand response programs denote the general family of mechanisms designed to improve the efficiency and reliability of electric power systems by affecting the demand of residential customers. First, we design a risk-sensitive online learning algorithm for linear models. In particular, we consider the setting in which an electric power utility seeks to curtail its peak electricity demand by offering a fixed group of customers a uniform price for reductions in consumption relative to their predetermined baselines. The underlying demand curve, which describes the aggregate reduction in consumption in response to the offered price, is assumed to be affine and subject to unobservable random shocks. Assuming that both the parameters of the demand curve and the distribution of the random shocks are initially unknown to the utility, we investigate the extent to which the utility might dynamically adjust its offered prices to maximize its cumulative risk-sensitive payoff over a finite number of T days. In order to do so effectively, the utility must design its pricing policy to balance the trade-off between the need to learn the unknown demand model (exploration) and maximize its payoff (exploitation) over time. We propose a semi-greedy pricing policy, and show that its expected regret defined as the risk-sensitive payoff loss over T days, relative to an oracle pricing policy that knows the underlying demand model, is no more than O(T^0.5\log(T)) . Moreover, the proposed pricing policy is shown to yield a sequence of prices that converge to the oracle optimal prices in the mean square sense. Second, we develop an online learning algorithm for linear models subject to stagewise safety constraints. More specifically, we introduce the safe linear stochastic bandit framework - a generalization of linear stochastic bandits - where, in each stage, the learner is required to select an arm with an expected reward that is no less than a predetermined (safe) threshold with high probability. We assume that the learner initially has knowledge of an arm that is known to be safe, but not necessarily optimal. Leveraging on this assumption, we introduce a learning algorithm that systematically combines known safe arms with exploratory arms to safely expand the set of safe arms over time, while facilitating safe greedy exploitation in subsequent stages. In addition to ensuring the satisfaction of the safety constraint at every stage of play, the proposed algorithm is shown to exhibit an expected regret that is no more than O((T\log(T))^0.5) after T stages of play. Third, we extend our methodology developed for linear models to design an online learning algorithm with a near-optimal performance for a more general class of nonparametric smooth reward models. Specifically, we adopt the perspective of an aggregator, which seeks to coordinate its purchase of demand reductions from a fixed group of residential electricity customers, with its sale of the aggregate demand reduction in a two-settlement wholesale energy market. The aggregator procures reductions in demand by offering its customers a uniform price for reductions in consumption relative to their predetermined baselines. Prior to its realization of the aggregate demand reduction, the aggregator must also determine how much energy to sell into the two-settlement energy market. In the day-ahead market, the aggregator commits to a forward contract, which calls for the delivery of energy in the real-time market. The underlying aggregate demand curve, which relates the aggregate demand reduction to the aggregator's offered price, is assumed to be unknown and subject to unobservable, random shocks. Assuming that both the demand curve and the distribution of the random shocks are initially unknown to the aggregator, we investigate the extent to which the aggregator might dynamically adapt its offered prices and forward contracts to maximize its expected profit over a time window of T days. Specifically, we design a dynamic pricing and contract offering policy that resolves the aggregator's need to learn the unknown demand model with its desire to maximize its cumulative expected profit over time. In particular, the proposed pricing policy is proven to incur an expected regret over T days that is no greater than O(T^0.5\log^2(T)).

Journal / Series

Volume & Issue

Description

149 pages

Sponsorship

Date Issued

2020-05

Publisher

Keywords

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Bitar, Eilyan

Committee Co-Chair

Committee Member

Zhao, Qing
Henderson, Shane

Degree Discipline

Electrical and Computer Engineering

Degree Name

Ph. D., Electrical and Computer Engineering

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record