Detecting Common Objects in Context

Lin, Tsung-Yi

Detecting Common Objects in Context

dc.contributor.author	Lin, Tsung-Yi
dc.contributor.chair	Belongie, Serge J.
dc.contributor.committeeMember	Chen, Tsuhan
dc.contributor.committeeMember	Snavely, Keith Noah
dc.date.accessioned	2018-04-26T14:15:33Z
dc.date.available	2018-04-26T14:15:33Z
dc.date.issued	2017-08-30
dc.description.abstract	Visual scene understanding is a basic function of human perception and one of the primary goals of computer vision. Object detection, which involves recognizing and localizing objects present in an environment, is a fundamental task in scene understanding. In the past years, object detection is one of most rapidly developing research areas in computer vision. Progress has been made through a combined efforts of large scale datasets, high quality annotations, and feature representations learned with novel convolutional neural network architectures. This thesis discusses both the process of dataset creation and the subsequent challenges in algorithm design for object detection. We create a large scale visual dataset Common Object in COntext (COCO) that contains objects in everyday scenes and detailed instance segmentation masks. The COCO dataset aims to enable research on detecting objects in an unconstrained environment and presents the combined challenges of recognizing objects in context and accurately localizing instances in 2D. We discuss the algorithm design to address the subsequent challenges in COCO dataset. First, we focus on learning multiscale feature representations to improve object detection performance over a wide range of object scales. We show that by leveraging the pyramidal shape of feature hierarchy in convolutional neural network (ConvNet), we can learn multiscale pyramidal feature representations that are semantic strong at all levels. The proposed Feature Pyramid Networks (FPN) provides generic feature presentations that greatly improve performance in terms of both accuracy and speed for various object detection applications. We then identify extreme class imbalance of foreground and background examples is an inherent challenge for designing the training objective of object detection algorithms. We propose a novel Focal Loss that focuses learning from important examples and ignore most easy background examples to solve the issue. We propose RetinaNet, a simple one-stage dense object detector using both the focal loss and FPN, and achieve state-of-the-art performance for both accuracy and speed on COCO dataset.
dc.identifier.doi	https://doi.org/10.7298/X4PC30H4
dc.identifier.other	Lin_cornellgrad_0058F_10491
dc.identifier.other	http://dissertations.umi.com/cornellgrad:10491
dc.identifier.other	bibid: 10361388
dc.identifier.uri	https://hdl.handle.net/1813/56711
dc.language.iso	en_US
dc.subject	Crowdsourcing
dc.subject	Object Recognition
dc.subject	Artificial intelligence
dc.subject	computer vision
dc.subject	Computer science
dc.subject	machine learning
dc.title	Detecting Common Objects in Context
dc.type	dissertation or thesis
dcterms.license	https://hdl.handle.net/1813/59810
thesis.degree.discipline	Electrical and Computer Engineering
thesis.degree.grantor	Cornell University
thesis.degree.level	Doctor of Philosophy
thesis.degree.name	Ph. D., Electrical and Computer Engineering

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lin_cornellgrad_0058F_10491.pdf
Size:: 11.46 MB
Format:: Adobe Portable Document Format

Download

Collections

Cornell Theses and Dissertations