Interpretable Approaches to Opening Up Black-Box Models
Tan, Hui Fen
In critical domains such as healthcare, finance, and criminal justice, merely knowing what was predicted, and not why, may be insufficient to deploy a machine learning model. This dissertation proposes new methods to open up black-box models, with the goal of helping creators, as well as users, of machine learning models increase their trust and understanding of the models. The first part of this dissertation proposes new post-hoc, global explanations for black-box models, developed using model-agnostic distillation techniques or by leveraging known structure specific to the black-box model. First, we propose a distillation approach to learn global additive explanations that describe the relationship between input features and model predictions, showing that distilled additive explanations have fidelity, accuracy, and interpretability advantages over non-additive explanations, via a user study with expert users. Second, we work specifically on tree ensembles, leveraging tree structure to construct a similarity metric for gradient boosted tree models. We use this similarity metric to select prototypical observations in each class, presenting an alternative to other tree ensemble interpretability methods such as seeking one tree that best represents the ensemble or feature importance methods. The second part of this dissertation studies the use of interpretability approaches to probe and debug black-box models in algorithmic fairness settings. Here, black-box takes on another meaning, with many risk-scoring models for high stakes decision such as credit scoring and judicial bail being proprietary and opaque, not lending themselves to easy inspection or validation. We propose Distill-and-Compare, an approach to probe such risk scoring models by leveraging additional information on ground-truth outcomes that the risk scoring model was intended to predict. We find that interpretability approaches can help uncover previously unknown sources of bias. Finally, we provide a concrete case study using the interpretability methods proposed in this dissertation to debug black-box models, in this case, a hybrid Human + Machine recidivism prediction model. Our methods revealed that human and COMPAS decision making anchored on the same features, and hence did not differ significantly enough to harness the promise of hybrid Human + Machine decision making, concluding this dissertation on interpretability approaches for real-world settings.
Statistics; black-box models; explanations; tree ensembles; Computer science; Interpretability; machine learning; fairness
Hooker, Giles J.
Wells, Martin Timothy; Joachims, Thorsten; Caruana, Rich A.
Doctor of Philosophy
dissertation or thesis