eCommons

 

DATA SUBSAMPLING FOR MODEL SELECTION IN AUTOML FRAMEWORKS

dc.contributor.authorNayar, Nandini
dc.contributor.chairUdell, Madeleine Richards
dc.contributor.committeeMemberHariharan, Bharath
dc.date.accessioned2021-09-09T17:38:02Z
dc.date.available2021-09-09T17:38:02Z
dc.date.issued2021-05
dc.description42 pages
dc.description.abstractThis project studies methods of using data subsampling to perform model selection. Most commonly used methods for model selection require training all models on the entire training data several times in order to pick the best one. This is often one of the most computationally expensive aspects of model selection. It would therefore be valuable to understand how resources can be better allocated to pick the best model for a given dataset. This project explores this question of how to optimize resource allocation for model selection by subsampling data. We try three different approaches to model selection starting with (1) a randomized multi-armed bandit approach, (2) subsampling using influence functions and finally (3) a new boosting based method that can be called iterative boosting. The first method uses 10 tabular datasets while the following two approaches use MNIST and CIFAR-10 image datasets and deep learning models. Each of these approaches uses a unique set of assumptions which provide some pros and cons for the intended task of model selection. Analysis of these three methods is done to better understand how subsampling can be better approached in order to take meaningful subsets of data to accurately estimate a model’s relative test performance. The hyperband method for subsampling seems to be the most effective in terms of computational complexity as well as getting good relative model performance. The iterative boosting method shows some promise on MNIST but requires more work in order to make it significantly better than random subsampling for more complex datasets like CIFAR-10.
dc.identifier.doihttps://doi.org/10.7298/vwhw-cv46
dc.identifier.otherNayar_cornell_0058O_11145
dc.identifier.otherhttp://dissertations.umi.com/cornell:11145
dc.identifier.urihttps://hdl.handle.net/1813/109675
dc.language.isoen
dc.subjectArtificial Intelligence
dc.subjectAutoML
dc.subjectData
dc.subjectMachine Learning
dc.subjectModel Selection
dc.subjectSubsampling
dc.titleDATA SUBSAMPLING FOR MODEL SELECTION IN AUTOML FRAMEWORKS
dc.typedissertation or thesis
dcterms.licensehttps://hdl.handle.net/1813/59810
thesis.degree.disciplineComputer Science
thesis.degree.grantorCornell University
thesis.degree.levelMaster of Science
thesis.degree.nameM.S., Computer Science

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nayar_cornell_0058O_11145.pdf
Size:
3.31 MB
Format:
Adobe Portable Document Format