eCommons

 

DATA SUBSAMPLING FOR MODEL SELECTION IN AUTOML FRAMEWORKS

Other Titles

Abstract

This project studies methods of using data subsampling to perform model selection. Most commonly used methods for model selection require training all models on the entire training data several times in order to pick the best one. This is often one of the most computationally expensive aspects of model selection. It would therefore be valuable to understand how resources can be better allocated to pick the best model for a given dataset. This project explores this question of how to optimize resource allocation for model selection by subsampling data. We try three different approaches to model selection starting with (1) a randomized multi-armed bandit approach, (2) subsampling using influence functions and finally (3) a new boosting based method that can be called iterative boosting. The first method uses 10 tabular datasets while the following two approaches use MNIST and CIFAR-10 image datasets and deep learning models. Each of these approaches uses a unique set of assumptions which provide some pros and cons for the intended task of model selection. Analysis of these three methods is done to better understand how subsampling can be better approached in order to take meaningful subsets of data to accurately estimate a model’s relative test performance. The hyperband method for subsampling seems to be the most effective in terms of computational complexity as well as getting good relative model performance. The iterative boosting method shows some promise on MNIST but requires more work in order to make it significantly better than random subsampling for more complex datasets like CIFAR-10.

Journal / Series

Volume & Issue

Description

42 pages

Sponsorship

Date Issued

2021-05

Publisher

Keywords

Artificial Intelligence; AutoML; Data; Machine Learning; Model Selection; Subsampling

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Udell, Madeleine Richards

Committee Co-Chair

Committee Member

Hariharan, Bharath

Degree Discipline

Computer Science

Degree Name

M.S., Computer Science

Degree Level

Master of Science

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record