Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Markov chain, Markov decision process, and deep reinforcement learning with applications to hospital management and real-time ride-hailing

Markov chain, Markov decision process, and deep reinforcement learning with applications to hospital management and real-time ride-hailing

File(s)
Feng_cornellgrad_0058F_12263.pdf (762.98 KB)
Permanent Link(s)
https://doi.org/10.7298/0x9s-6r20
https://hdl.handle.net/1813/102998
Collections
Cornell Theses and Dissertations
Author
Feng, Jiekun
Abstract

This dissertation summarizes the research in the area of Markov chains, Markov decision processes, and deep reinforcement learning. Motivated from customer queues in a call center or a hospital inpatient ward, the stationary distributions of continuous- and discrete-time Markov chains often have important implications. As the exact forms of these stationary distributions may not exist and furthermore may be time-consuming to evaluate, closed-form approximations of high quality are desirable. Utilizing the Stein's method, we provides a framework to 1) construct an appropriate approximator to the stationary distribution of a Markov chain; and 2) establish an upper bound on the approximation error that is pretty sharp. Extending Markov chains to Markov decision processes, we develop a framework to model a common yet challenging problem posed to the real-time ride-hailing services or platforms such as Uber: In anticipation of future demand/supply conditions, how to carry out car-passenger matching and empty-car routing so as to optimize certain objective over a time horizon of interest -- typically several hours to a day. In face of the immense state and action space entailed by the complicated time-varying environment of the ride-hailing service, we 1) decompose the decision at every decision time point into a sequence of instantaneous actions, each induced by a single car or passenger; and 2) develop an algorithm combining the state-of-the-art deep learning and reinforcement learning (deep reinforcement learning) techniques. While recent years have witnessed the prevalence of deep reinforcement learning research with real-world impacts in a number of applications -- ride-hailing among them, our work is unique in that it seeks to 1) optimize the long-term utility from the perspective of the system instead of the individual cars, in particular, we use global demand and supply conditions as our features, which do not concentrate upon any single agent (car candidate in our case); in fact, standing at any (instantaneous) decision time point, we do not know which car candidate to process (as opposed to following the conventional first-in-first-out rule), and let the features make the selection; 2) learn the optimal policy directly through the reinforcement learning framework, for this purpose, our algorithm includes both policy evaluation and policy improvement steps in multiple policy iterations. Numerical study based on real-world data from Didi Chuxing demonstrates the efficacy of our methodology: It rapidly and stably improves the fraction of ride requests fulfilled to a level comparable to the state-of-the-art result, and after 73 policy iterations, surpasses it by 3%.

Description
207 pages
Date Issued
2020-08
Keywords
deep learning
•
Markov chain
•
Markov decision process
•
Proximal Policy Optimization
•
reinforcement learning
Committee Chair
Dai, Jim
Committee Member
Wells, Martin Timothy
Iyer, Krishnamurthy
Degree Discipline
Statistics
Degree Name
Ph. D., Statistics
Degree Level
Doctor of Philosophy
Type
dissertation or thesis
Link(s) to Catalog Record
https://catalog.library.cornell.edu/catalog/13277887

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance