Interpretable Approaches to Opening Up Black-Box Models

Other Titles
In critical domains such as healthcare, finance, and criminal justice, merely knowing what was predicted, and not why, may be insufficient to deploy a machine learning model. This dissertation proposes new methods to open up black-box models, with the goal of helping creators, as well as users, of machine learning models increase their trust and understanding of the models. The first part of this dissertation proposes new post-hoc, global explanations for black-box models, developed using model-agnostic distillation techniques or by leveraging known structure specific to the black-box model. First, we propose a distillation approach to learn global additive explanations that describe the relationship between input features and model predictions, showing that distilled additive explanations have fidelity, accuracy, and interpretability advantages over non-additive explanations, via a user study with expert users. Second, we work specifically on tree ensembles, leveraging tree structure to construct a similarity metric for gradient boosted tree models. We use this similarity metric to select prototypical observations in each class, presenting an alternative to other tree ensemble interpretability methods such as seeking one tree that best represents the ensemble or feature importance methods. The second part of this dissertation studies the use of interpretability approaches to probe and debug black-box models in algorithmic fairness settings. Here, black-box takes on another meaning, with many risk-scoring models for high stakes decision such as credit scoring and judicial bail being proprietary and opaque, not lending themselves to easy inspection or validation. We propose Distill-and-Compare, an approach to probe such risk scoring models by leveraging additional information on ground-truth outcomes that the risk scoring model was intended to predict. We find that interpretability approaches can help uncover previously unknown sources of bias. Finally, we provide a concrete case study using the interpretability methods proposed in this dissertation to debug black-box models, in this case, a hybrid Human + Machine recidivism prediction model. Our methods revealed that human and COMPAS decision making anchored on the same features, and hence did not differ significantly enough to harness the promise of hybrid Human + Machine decision making, concluding this dissertation on interpretability approaches for real-world settings.
Journal / Series
Volume & Issue
Date Issued
Statistics; black-box models; explanations; tree ensembles; Computer science; Interpretability; machine learning; fairness
Effective Date
Expiration Date
Union Local
Number of Workers
Committee Chair
Hooker, Giles J.
Committee Co-Chair
Committee Member
Wells, Martin Timothy
Joachims, Thorsten
Caruana, Rich A.
Degree Discipline
Degree Name
Ph.D., Statistics
Degree Level
Doctor of Philosophy
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
Link(s) to Reference(s)
Previously Published As
Government Document
Other Identifiers
Rights URI
dissertation or thesis
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record