Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Learning Conditional Models for Visual Perception

Learning Conditional Models for Visual Perception

File(s)
Veit_cornellgrad_0058F_10825.pdf (7.11 MB)
Permanent Link(s)
https://doi.org/10.7298/X44T6GKG
https://hdl.handle.net/1813/59494
Collections
Cornell Theses and Dissertations
Author
Veit, Andreas
Abstract

In recent years, the field of computer vision has seen a series of major advances, made possible by rapid development in algorithms, data collection and computing infrastructure. As a result, vision systems have started to be broadly adopted in everyday applications. Progress has been particularly promising in image recognition, where algorithms now often match human performance. Nevertheless, vision systems still largely fall behind humans in their ability to understand the complexities of the visual world and its apparent contradictions. For example, an image can carry different meanings to different people in different contexts. However, being often limited to a single point of view, vision systems tend to focus on the meaning that dominates in the training data. In this dissertation, we address this limitation by building conditional vision models that can learn from multiple points of view and adapt their results to account for different conditions. First, we address the related tasks of image tagging and tag based image retrieval. In particular, we build a system that can take into account the fact that people may associate different meaning with certain images and tags. Thus, the system can personalize outputs for ambiguous tags such as #rock, which could refer either to a music genre, a geological object or even outdoor climbing. Further, we focus on the task of image based similarity search. Specifically, we design a system that can understand multiple notions of similarity. For example, when searching for related items to an input images of a shoe, users might be interested in shoes of similar color, style, or for the same kind of activity. By capturing the multitude of aspects in terms of which objects can be compared, our system can find the right set of related items. Lastly, we explore how the underlying convolutional networks themselves can be made aware of the context in which they are used. In a study, we first discover a new understanding of the roles that individual layers take on in modern convolutional networks. Then, we leverage our insights and design a network that can adaptively define its own topology conditioned on the input image to increase both accuracy and efficiency.

Date Issued
2018-05-30
Keywords
computer vision
•
machine learning
•
Computer science
Committee Chair
Belongie, Serge J.
Committee Member
Kleinberg, Jon M.
Naaman, Mor
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Rights
Attribution-NonCommercial-ShareAlike 4.0 International
Rights URI
https://creativecommons.org/licenses/by-nc-sa/4.0/
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance