eCommons

 

Ensemble Trees And Clts: Statistical Inference In Machine Learning

Other Titles

Abstract

As data grows in size and complexity, scientists are relying more heavily on learning algorithms that can adapt to underlying relationships in the data without imposing a formal model structure. These learning algorithms can produce very accurate predictions, but create something of a black-box and thus are very difficult to analyze. Classical statistical models on the other hand insist on a more rigid structure but are intuitive and easy to interpret. The fundamental goal of this work is to bridge these approaches by developing limiting distributions and formal statistical inference procedures for broad classes of ensemble learning methods. This is accomplished by drawing a connection between the structure of subsampled ensembles and U-statistics. In particular, we extend the existing theory of U-statistics to include infinite-order and random kernel cases and develop the relevant asymptotic theory for these new classes of estimators. This allows us to produce confidence intervals for predictions generated by supervised learning ensembles like bagged trees and random forests. We also develop formal testing procedures for feature significance and extend these to produce hypothesis tests for additivity. When a large number of test points is required or the additive structure is particularly complex, we employ random projections and utilize recent theoretical developments. Finally, we further extend these ideas and propose an alternative permutation scheme to address the problem of variable selection with random forests.

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

2015-08-17

Publisher

Keywords

U-statistics; Random Forests; Bagging

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Hooker,Giles J.

Committee Co-Chair

Committee Member

Wegkamp,Marten H.
Wells,Martin Timothy

Degree Discipline

Statistics

Degree Name

Ph. D., Statistics

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record