Intelligible Models: Recovering Low Dimensional Additive Structure For Machine Learning Models
Different supervised learning models have different bias-variance tradeoffs. For low dimensional problems, low-bias models such boosted trees or SVMs with RBF kernels are very accurate but are unfortunately no longer interpretable by the users. For high dimensional problems, high-bias models such as regularized linear/logistic regressions are usually preferred over other models because of the curse of dimensionality and the exponentially growing hypothesis space but it is not clear whether we could further improve the accuracy from those high-bias models. Additive modeling is an excellent tool to control the bias and variance in a finer granularity and provides a great solution to these problems. Generalized additive models (GAMs) express the hypothesis as a sum of components, where each component can include any number of variables. Therefore, by prudently selecting the components or restricting the number of complex components and carefully controlling the complexity of each selected component, GAMs are very flexible of modeling hypothesis with different biases. This dissertation presents a family of additive models called intelligible models, which effectively recover the low dimensional additive structures. Those low dimensional additive components provide the opportunities for data scientists to investigate each simple component individually, and therefore the interpretability is significantly improved. We first present a large-scale empirical study of various methods for fitting GAMs. We demonstrate empirically that gradient boosting with shallow bagged trees yield the best accuracy. In ad- dition, we propose a very efficient method of detecting pairwise feature interactions that scales to thousands of features. With a large-scale empirical study, we show that models with low dimensional additive components (one- and twodimensional components) are as accurate as complex models such as random forests. Finally, we develop a method to carefully control the complexity of the intelligible models by feature selection and intelligently deciding whether the selected term is linear or nonlinear, and show that on high dimensional problems we can further improve the accuracy from the popular linear models by allowing a small set of features to act nonlinearly.
intelligible models; classification and regression; interaction detection
Gehrke, Johannes E.
Kozen, Dexter Campbell; Snavely, Keith Noah; Caruana, Rich A.; Hooker, Giles J.
Ph. D., Computer Science
Doctor of Philosophy
dissertation or thesis