Structural Results for Constrained Markov Decision Processes
MetadataShow full item record
Girard, Cory Jay
In the existing literature on the dynamic control of service systems, a decision-maker seeks to optimize a single performance metric over a given time-horizon. However, in many settings, the decision-maker may be interested in multiple performance metrics. Take, for instance, the problem of assigning cross-trained hospital staff to two classes of patients: low-priority and high-priority. In the typical framework, this problem could be modelled as a Markov Decision Process (MDP), in which the performance metric to be minimized is a weighted combination of expected waiting times for each class. However, we argue that a more natural approach is to consider the constrained problem: minimizing the expected waiting time for lower priority patients, while keeping that of higher priority patients under a given target, $V$. In particular, we concern ourselves with uncovering structural properties of this problem. These properties imply the existence of simple optimal policies that are easy to implement in practice. We formulate this problem (the parallel setting), as well as a related problem in which customers undergo two phases of service in series (the tandem setting), as Constrained Markov Decision Processes (CMDPs). We present a general framework for solving two-class CMDPs, showing that they can be solved by using the Lagrangian dual to specify a particular unconstrained problem. If an appropriate Lagrange multiplier can be discerned, structural results from the resulting Lagrangian relaxation can be used to exploit structure in the original CMDP. We show that for both the parallel and tandem settings, the framework leads to simple threshold-like optimal policies. The results in each case are used to develop heuristics for analogous problems with abandonments with applications to healthcare, call centers and manufacturing systems. The efficacy of the heuristics are verified in each case via a detailed numerical study. We then extend the results in the parallel case to handle multiple classes and constraints. Lastly, we consider a controlled, truncated birth-death chain motivated by optimal treatment prescription in the context of personalized medicine. In this model, states represent the patient's state of health, and treatments can be prescribed to influence improvement and/or deterioration of health. The problem of dynamically prescribing treatments at minimal cost while maintaining a given level of health is modelled as a two-cost CMDP. Rather than employing the more general methods developed earlier, we decompose the state space and consider an alternative Lagrangian relaxation involving two simpler subproblems. We obtain structural results for this problem by showing that optimal solutions to these subproblems can be combined into a constrained-optimal policy. We then attempt to find conditions under which a monotone optimal policy exists under more general transition rates between states of health.
applied probability; Operations research; queueing theory; constrained optimization; dynamic programming; Markov decision processes; Optimal control
Lewis, Mark E.
Lewis, Adrian S.; Henderson, Shane G.
Ph. D., Operations Research
Doctor of Philosophy
dissertation or thesis