ASYMPTOTICS AND INTERPRETABILITY OF DECISION TREES AND DECISION TREE ENSEMBLES
Decision trees and decision tree ensembles are widely used nonparametric statistical models. A decision tree is a binary tree that recursively segments the covariate space along the coordinate directions to create hyper rectangles as basic prediction units for fitting constant values within each of them. A decision tree ensemble combines multiple decision trees, either in parallel or in sequence, in order to increase model flexibility and accuracy, as well as to reduce prediction variance. Despite the fact that tree models have been extensively used in practice, results on their asymptotic behaviors are scarce. In this thesis we present our analyses on tree asymptotics in the perspectives of tree terminal nodes, tree ensembles and models incorporating tree ensembles respectively. Our study introduces a few new tree related learning frameworks for which we can provide provable statistical guarantees and interpretations. Our study on the Gini index used in the greedy tree building algorithm reveals its limiting distribution, leading to the development of a test of better splitting that helps to measure the uncertain optimality of a decision tree split. This test is combined with the concept of decision tree distillation, which implements a decision tree to mimic the behavior of a block box model, to generate stable interpretations by guaranteeing a unique distillation tree structure as long as there are sufficiently many random sample points. Meanwhile, we apply mild modification and regularization to the standard tree boosting to create a new boosting framework named Boulevard. The major difference Boulevard has in contrast to the original framework is our integration of two new mechanisms: honest trees, which isolate the tree terminal values from the tree structure, and adaptive shrinkage, which scales the boosting history to create an equally weighted ensemble. With carefully chosen rates, we establish consistency and asymptotic normality for Boulevard predictions. This theoretical development provides us with the prerequisite for the practice of statistical inference with boosted trees. Lastly, we investigate the feasibility of incorporating existing semi-parametric models with tree boosting. We study the varying coefficient modeling framework with boosted trees applied as its nonparametric effect modifiers, because it is the generalization of several popular learning models including partially linear regression and functional trees. We demonstrate that the new framework is not only theoretically sound as it achieves consistency, but also empirically intelligible as it is capable of producing comprehensible model structures and intuitive visualization.
Statistics; Boosting; Asymptotics; Decision Tree; Interpretability; Model Distillation
Doctor of Philosophy
Attribution-ShareAlike 2.0 Generic
dissertation or thesis
Except where otherwise noted, this item's license is described as Attribution-ShareAlike 2.0 Generic