Model Combinations and the Infinitesimal Jackknife: How to refine models with boosting and quantify uncertainty
In this thesis we aim to create a framework to quantify uncertainty of model predictions using the Infinitesimal Jackknife technique. We also aim to combine the principles of boosting and bagging to create higher quality predictions than those from currently used ensemble models. In the first part of the thesis we use boosting with random forests - which is a bagged estimators with decision trees as base learners. Focusing only on continuous responses which can be modelled by a Gaussian distribution, we see that this new model called "boosted forest" has much lower bias than the "base'' random forest. We also use the infinitesimal Jackknife to provide variance estimates for the boosted forest. These variance estimates are slightly higher than random forests but combined with the much lower bias we obtain higher confidence interval coverage of the underlying signal for simulated data, and higher prediction interval coverage of the response in case of real datasets from the UCI database. Next we extend the boosting procedure to work with responses from any general exponential family. We start with an MLE-type base estimator and then define generalised "residuals'' with the goal of maximising training log-likelihood via a Newton boosting step. We can then fit random forests for further boosting steps, with these generalised residuals as the response. Also similar to above we provide variance estimates for this "generalised boosted forest" model using the Infinitesimal Jackknife and the resulting confidence and prediction intervals are shown to have good coverage. Finally we leverage the Infinitesimal Jackknife in its full generality to develop a covariance measure between any two models and any two test-points (on the same training data). We can then use this covariance measure to construct statistical comparison tests between any two models. We also fully generalise the concept of boosting one model with any other model based on generalised residuals, for responses from the exponential family. We can also use the model comparison test to check if the boosting made a statistically significant difference.
Infinitesimal Jackknife; Interpretable machine learning; Model Comparison; Newton boosting; U-statistics; Uncertainty Quantification
Hooker, Giles J.
Basu, Sumanta; Sridharan, Karthik
Ph. D., Statistics
Doctor of Philosophy
Attribution-NonCommercial 4.0 International
dissertation or thesis
Except where otherwise noted, this item's license is described as Attribution-NonCommercial 4.0 International