Model Combinations and the Infinitesimal Jackknife: How to refine models with boosting and quantify uncertainty

Other Titles


In this thesis we aim to create a framework to quantify uncertainty of model predictions using the Infinitesimal Jackknife technique. We also aim to combine the principles of boosting and bagging to create higher quality predictions than those from currently used ensemble models. In the first part of the thesis we use boosting with random forests - which is a bagged estimators with decision trees as base learners. Focusing only on continuous responses which can be modelled by a Gaussian distribution, we see that this new model called "boosted forest" has much lower bias than the "base'' random forest. We also use the infinitesimal Jackknife to provide variance estimates for the boosted forest. These variance estimates are slightly higher than random forests but combined with the much lower bias we obtain higher confidence interval coverage of the underlying signal for simulated data, and higher prediction interval coverage of the response in case of real datasets from the UCI database. Next we extend the boosting procedure to work with responses from any general exponential family. We start with an MLE-type base estimator and then define generalised "residuals'' with the goal of maximising training log-likelihood via a Newton boosting step. We can then fit random forests for further boosting steps, with these generalised residuals as the response. Also similar to above we provide variance estimates for this "generalised boosted forest" model using the Infinitesimal Jackknife and the resulting confidence and prediction intervals are shown to have good coverage. Finally we leverage the Infinitesimal Jackknife in its full generality to develop a covariance measure between any two models and any two test-points (on the same training data). We can then use this covariance measure to construct statistical comparison tests between any two models. We also fully generalise the concept of boosting one model with any other model based on generalised residuals, for responses from the exponential family. We can also use the model comparison test to check if the boosting made a statistically significant difference.

Journal / Series

Volume & Issue


203 pages


Date Issued




Infinitesimal Jackknife; Interpretable machine learning; Model Comparison; Newton boosting; U-statistics; Uncertainty Quantification


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Hooker, Giles J.

Committee Co-Chair

Committee Member

Basu, Sumanta
Sridharan, Karthik

Degree Discipline


Degree Name

Ph. D., Statistics

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Attribution-NonCommercial 4.0 International


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record