Machine Learning Methods for Data-driven Decision Making: Contextual Optimization, Causal Inference, and Algorithmic Fairness

Mao, Xiaojie

Machine Learning Methods for Data-driven Decision Making: Contextual Optimization, Causal Inference, and Algorithmic Fairness

Files

Mao_cornellgrad_0058F_12471.pdf (7.57 MB)

Permanent Link(s)

https://doi.org/10.7298/4mjd-1860

https://hdl.handle.net/1813/109774

Collections

Cornell Theses and Dissertations

Full item page

Author(s)

Mao, Xiaojie

Abstract

Recent advances in machine learning (ML) hold much promise for using data to drive more effective decisions. However, many challenges remain to our realizing this decision-making potential, given the limitations of predictive algorithms and imperfections in the data available. This thesis investigates these critical challenges in the areas of data-driven optimization, causal inference, and algorithmic fairness, and develops fundamental theory and new ML methods. In Part I, we focus on data-driven optimization that involves both uncertain quantities of interest (e.g., demand) and predictive contextual features (e.g., product characteristics). Chapters 2 and 3 focus on stochastic optimization problems and investigate two popular paradigms: an "estimate-then-optimize" paradigm that first uses standard ML methods to predict the distribution of uncertain quantities given contextual features and then plugs the distributional predictions into a stochastic optimization problem to solve for decisions, and an "end-to-end" paradigm that integrates the prediction and the decision-making aspects by directly training predictive models to target good decisions. In Chapter 2, we develop an "end-to-end" stochastic optimization forest algorithm that constructs decision trees to directly optimize decision quality, which we show provides significant benefits for decision making over building trees that target prediction. In Chapter 3, we show a more nuanced landscape for the integration of estimation and optimization, by identifying many common settings where "end-to-end" approaches can actually have much slower regret-convergence rates than the far simpler "estimate-then-optimize" approach. Chapter 4 considers the online setting where we collect more data as we make decisions and studies nonparametric contextual bandits with smooth expected reward functions. We develop a novel algorithm that leverages this smoothness structure and show that its regret rate is minimax optimal. Our regret analysis reveals the full spectrum of relationship between regret in contextual bandits and the smoothness of reward functions, recovering existing results for Lipschitz and parametric reward functions at the extremes. In Part II, we study causal inference with observational data in complex settings. In Chapter 5, we consider the estimation and inference of complex causal parameters such as quantile treatment effects whose efficient estimation requires learning nuisance functions that depend on the parameter itself. We propose a localized debiased machine learning approach that avoids this complex dependence and need only rely on simple nuisance-function estimation that can be easily outsourced to standard ML algorithms. The resulting estimators are not only practically feasible but also theoretically-grounded with asymptotically optimal distributions under weak conditions. In Chapter 6, we tackle the common challenge wherein some confounders cannot be measured exactly and only noisy proxy observations thereof are available. We propose to use matrix factorization to infer confounders from noisy proxies and then estimate causal effects based on the inferred confounders. This provides a flexible and principled framework that adapts to missing values, accommodates many data types, and can enhance a wide variety of causal inference methods. In Part III, we tackle a prevalent challenge in assessing the fairness of decision-making algorithms with respect to a protected class (e.g., race and ethnicity): the protected class is often unobserved in practice. In Chapter 7, we analyze the bias of proxy methods that impute class labels by ML algorithms and that have been extensively applied in consumer-financial and healthcare contexts. This is the first rigorous analysis of how such proxy methods can lead to biased disparity assessments. In Chapter 8, we prove the fundamental impossibility of exactly measuring decision disparity without class labels, and propose algorithms that estimate and visualize the tightest possible set of all values of true disparities that are consistent with the observed data. Our proposal thus provides a robust and reliable fairness auditing tool that fully takes into account the inherent ambiguity in disparity assessment due to missing protected classes.

Description

509 pages

Date Issued

2021-05

Keywords

Algorithmic Fairness; Causal Inference; Data-driven Decision Making; Data-driven Optimization; Machine Learning

Committee Chair

Kallus, Nathan

Committee Member

Udell, Madeleine Richards
Frazier, Peter
Joachims, Thorsten

Degree Discipline

Statistics

Degree Name

Ph. D., Statistics

Degree Level

Doctor of Philosophy

Types

dissertation or thesis

Machine Learning Methods for Data-driven Decision Making: Contextual Optimization, Causal Inference, and Algorithmic Fairness

Files

No Access Until

Permanent Link(s)

Collections

Other Titles

Author(s)

Abstract

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

Publisher

Keywords

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record