Structural Results for Constrained Markov Decision Processes

Other Titles


In the existing literature on the dynamic control of service systems, a decision-maker seeks to optimize a single performance metric over a given time-horizon. However, in many settings, the decision-maker may be interested in multiple performance metrics. Take, for instance, the problem of assigning cross-trained hospital staff to two classes of patients: low-priority and high-priority. In the typical framework, this problem could be modelled as a Markov Decision Process (MDP), in which the performance metric to be minimized is a weighted combination of expected waiting times for each class. However, we argue that a more natural approach is to consider the constrained problem: minimizing the expected waiting time for lower priority patients, while keeping that of higher priority patients under a given target, V. In particular, we concern ourselves with uncovering structural properties of this problem. These properties imply the existence of simple optimal policies that are easy to implement in practice. We formulate this problem (the parallel setting), as well as a related problem in which customers undergo two phases of service in series (the tandem setting), as Constrained Markov Decision Processes (CMDPs). We present a general framework for solving two-class CMDPs, showing that they can be solved by using the Lagrangian dual to specify a particular unconstrained problem. If an appropriate Lagrange multiplier can be discerned, structural results from the resulting Lagrangian relaxation can be used to exploit structure in the original CMDP. We show that for both the parallel and tandem settings, the framework leads to simple threshold-like optimal policies. The results in each case are used to develop heuristics for analogous problems with abandonments with applications to healthcare, call centers and manufacturing systems. The efficacy of the heuristics are verified in each case via a detailed numerical study. We then extend the results in the parallel case to handle multiple classes and constraints. Lastly, we consider a controlled, truncated birth-death chain motivated by optimal treatment prescription in the context of personalized medicine. In this model, states represent the patient's state of health, and treatments can be prescribed to influence improvement and/or deterioration of health. The problem of dynamically prescribing treatments at minimal cost while maintaining a given level of health is modelled as a two-cost CMDP. Rather than employing the more general methods developed earlier, we decompose the state space and consider an alternative Lagrangian relaxation involving two simpler subproblems. We obtain structural results for this problem by showing that optimal solutions to these subproblems can be combined into a constrained-optimal policy. We then attempt to find conditions under which a monotone optimal policy exists under more general transition rates between states of health.

Journal / Series

Volume & Issue



Date Issued




applied probability; Operations research; queueing theory; constrained optimization; dynamic programming; Markov decision processes; Optimal control


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Lewis, Mark E.

Committee Co-Chair

Committee Member

Lewis, Adrian S.
Henderson, Shane G.

Degree Discipline

Operations Research

Degree Name

Ph. D., Operations Research

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Rights URI


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record