Other Titles
The field of computer vision has benefited tremendously from an unusual blessing: a baseline that works quite well on almost every problem, is extremely simple to implement, and sometimes shockingly hard to surpass. The existence of such a baseline runs at odds to the history of vision; prior to deep learning, approaches historically required careful design to obtain non-trivial performance on many benchmarks. This baseline is the process of pretrainingand finetuning; initially training a neural network on a large dataset to perform some task and then applying components of that network to new tasks with minimal modifications or additions. Specifically, the feature extractor (dimensionality-reducing module) of the initial network is re-used with only a small task-specific output network or layer needing to be trained from scratch. In the last decade, this simple process has served as a building block for state-of-the-art on numerous vision benchmarks, sometimes with minimal to no modifications. For all of its success, however, this method is still understudied and under-developed. Until recently, there was very little effort to determine which task to pretrain this network on; supervised training on the ImageNet classification benchmark was simply considered sufficient. There has been some study about why this transfer occurs, but relatively few attempts to better the transfer, especially from a methodology instead of parameter optimization perspective. In this thesis, I consider from multiple perspectives the problem of using prior knowledge and pretraining when given a new collection of images. Chapter 1 introduces the method of pretraining and finetuning in further detail and gives a summary of the contributions of this work. In Chapter 2, I consider a variety of self-supervised learning methods on an extremely varied set of domains, with the goal of understanding what signals exist to learn and what signals are exploited or unexploited by pretraining on or off the domain. I also consider how the pretraining and finetuning process compares to domain-specific self-supervised learning. This work was published in ECCV 2020. In Chapter 3, we address the problem of improving the feature extraction process directly. A novel architecture and algorithm for the self-supervised ensembling of pretrained networks on a novel dataset is presented. This method dramatically improves nearest-neighbor classification performance on generalized-to datasets, and even is applicable to the single-model (non-ensemble) setting. This work was performed during an internship at Salesforce Research under the mentorship of Devansh Arpit and is currently under review. In Chapter 4, we consider the improvement of transfer learning by domain-specific pretraining. The considered task, called expert selection, deals with choosing the optimal pretraining category out of a given set. Here we match the previous state-of-the-art’s fully supervised performance despite our method not requiring labels on the target or source domains. Additionally, we present a more general form of the method which requires weaker assumptions about the existence of powerful pretrained networks. This work was part of a collaboration with Ziyang Wu during his time as a Master’s student at Cornell and was published in CVPR 2021
Journal / Series
Volume & Issue
177 pages
Date Issued
Effective Date
Expiration Date
Union Local
Number of Workers
Committee Chair
Hariharan, Bharath
Committee Co-Chair
Committee Member
Wegkamp, Marten H.
Frazier, Peter
Degree Discipline
Applied Mathematics
Degree Name
Ph. D., Applied Mathematics
Degree Level
Doctor of Philosophy
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
Link(s) to Reference(s)
Previously Published As
Government Document
Other Identifiers
Rights URI
dissertation or thesis
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record