eCommons

 

Model Combinations and the Infinitesimal Jackknife: How to refine models with boosting and quantify uncertainty

Other Titles

Abstract

In this thesis we aim to create a framework to quantify uncertainty of model predictions using the Infinitesimal Jackknife technique. We also aim to combine the principles of boosting and bagging to create higher quality predictions than those from currently used ensemble models. In the first part of the thesis we use boosting with random forests - which is a bagged estimators with decision trees as base learners. Focusing only on continuous responses which can be modelled by a Gaussian distribution, we see that this new model called "boosted forest" has much lower bias than the "base'' random forest. We also use the infinitesimal Jackknife to provide variance estimates for the boosted forest. These variance estimates are slightly higher than random forests but combined with the much lower bias we obtain higher confidence interval coverage of the underlying signal for simulated data, and higher prediction interval coverage of the response in case of real datasets from the UCI database. Next we extend the boosting procedure to work with responses from any general exponential family. We start with an MLE-type base estimator and then define generalised "residuals'' with the goal of maximising training log-likelihood via a Newton boosting step. We can then fit random forests for further boosting steps, with these generalised residuals as the response. Also similar to above we provide variance estimates for this "generalised boosted forest" model using the Infinitesimal Jackknife and the resulting confidence and prediction intervals are shown to have good coverage. Finally we leverage the Infinitesimal Jackknife in its full generality to develop a covariance measure between any two models and any two test-points (on the same training data). We can then use this covariance measure to construct statistical comparison tests between any two models. We also fully generalise the concept of boosting one model with any other model based on generalised residuals, for responses from the exponential family. We can also use the model comparison test to check if the boosting made a statistically significant difference.

Journal / Series

Volume & Issue

Description

203 pages

Sponsorship

Date Issued

2021-08

Publisher

Keywords

Infinitesimal Jackknife; Interpretable machine learning; Model Comparison; Newton boosting; U-statistics; Uncertainty Quantification

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Hooker, Giles J.

Committee Co-Chair

Committee Member

Basu, Sumanta
Sridharan, Karthik

Degree Discipline

Statistics

Degree Name

Ph. D., Statistics

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Attribution-NonCommercial 4.0 International

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record