Other Titles
Online learning is the process of learning to make accurate predictions and optimize actions sequentially in each period based on the information gained through the previous decisions and observations. In many real-world problems, the underlying model is unknown and possibly stochastic. Therefore, it is not possible to optimize actions using analytical methods. The goal of online learning is to learn from past observations as quickly as possible to minimize the loss that results from not knowing the true underlying model. Since uncertainty plays a big role in power system operations and power consumption, it makes optimizing actions a very challenging task for participants of wholesale electricity markets. This results in various interesting problems that requires an online learning approach. Motivated by two different applications in electricity markets, we study two different online learning problems. We first study the problem of online learning and optimization of unknown Markov jump affine models which is motivated by the dynamic pricing problem of an electricity retailer. An online learning policy, referred to as Markovian simultaneous perturbations stochastic approximation (MSPSA), is proposed for two different optimization objectives: (i) the quadratic cost minimization of the regulation problem and (ii) the revenue (profit) maximization problem. It is shown that MSPSA is an order optimal learning policy in terms of regret growth rate. More specifically, the regret of MSPSA grows at the order of the square root of the learning horizon, and the regret of any policy grows no slower than that of MSPSA. Furthermore, it is also shown that the MSPSA policy converges to the optimal control input almost surely as well as in the mean square sense. Simulation results are presented to illustrate the regret growth rate of MSPSA and to show that MSPSA can offer significant gain over the greedy certainty equivalent approach. Motivated by virtual trading in two-settlement wholesale electricity markets, the second problem we consider is the online learning problem of optimal bidding strategy in repeated multi-commodity auctions. A polynomial-time online learning algorithm is proposed to maximize the cumulative payoff over a finite horizon by allocating the bidder's budget among his bids for K options in each period. The proposed algorithm, referred to as dynamic programming on discrete set (DPDS), achieves a regret order of O(√(T log⁡T) ). By showing that the regret is lower bounded by Ω(√T) for any strategy, we conclude that DPDS is order optimal up to a √log⁡T term. Our result also implies that the expected payoff of DPDS converges, with an almost optimal convergence rate, to the expected payoff of the global optimal corresponding to the case when the underlying model is known. By using both cumulative payoff and Sharpe ratio as performance metrics, evaluations were performed based on historical data spanning ten year period of NYISO and PJM energy markets. It was shown that the proposed strategy outperforms standard benchmarks and the S&P 500 index over the same period.
Journal / Series
Volume & Issue
Date Issued
dynamic pricing; electricity markets; Electrical engineering; Algorithmic Bidding; Continuum-Armed Bandit; Online Machine Learning; Sequential Decision Making
Effective Date
Expiration Date
Union Local
Number of Workers
Committee Chair
Tong, Lang
Committee Co-Chair
Committee Member
Wagner, Aaron B.
Zhao, Qing
Degree Discipline
Electrical and Computer Engineering
Degree Name
Ph. D., Electrical and Computer Engineering
Degree Level
Doctor of Philosophy
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
Link(s) to Reference(s)
Previously Published As
Government Document
Other Identifiers
Rights URI
dissertation or thesis
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record