Hallucinated Humans: Learning Latent Factors To Model 3D Environments

Jiang, Yun

Hallucinated Humans: Learning Latent Factors To Model 3D Environments

dc.contributor.author	Jiang, Yun
dc.contributor.chair	Saxena,Ashutosh
dc.contributor.committeeMember	James,Douglas Leonard
dc.contributor.committeeMember	Kleinberg,Robert David
dc.contributor.committeeMember	Joachims,Thorsten
dc.date.accessioned	2015-10-15T18:12:04Z
dc.date.available	2015-10-15T18:12:04Z
dc.date.issued	2015-08-17
dc.description.abstract	The ability to correctly reason about human environment is critical for personal robots. For example, if a robot is asked to tidy a room, it needs to detect object types, such as shoes and books, and then decides where to place them properly. Sometimes being able to anticipate human-environment interactions is also desirable. For example, the robot would not put any object on the chair if it understands that humans would sit on it. The idea of modeling object-object relations has been widely leveraged in many scene understanding applications. For instance, the object found in front of a monitor is more likely to be a keyboard because of the high correlation of the two objects. However, as the objects are designed by humans and for human usage, when we reason about a human environment, we reason about it through an interplay between the environment, objects and humans. For example, the objects, monitor and keyboard, are strongly spatially correlated only because a human types on the keyboard while watching the monitor. The key idea of this thesis is to model environments not only through objects, but also through latent human poses and human-object interactions. We start by designing a generic form of human-object interaction, also referred as 'object affordance'. Human-object relations can thus be quantified through a function of object affordance, human configuration and object con- figuration. Given human poses and object affordances, we can capture the relations among humans, objects and the scene through Conditional Random Fields (CRFs). For scenarios where no humans present, our idea is to still leverage the human-object relations by hallucinating potential human poses. In order to handle the large number of latent human poses and a large variety of their interactions with objects, we present Infinite Latent Conditional Random Field (ILCRF) that models a scene as a mixture of CRFs generated from Dirichlet processes. In each CRF, we model objects and object-object relations as existing nodes and edges, and hidden human poses and human-object relations as latent nodes and edges. ILCRF generatively models the distribution of different CRF structures over these latent nodes and edges. We apply the model to the challenging applications of 3D scene labeling and robotic scene arrangement. In extensive experiments, we show that our model significantly outperforms the state-of-the-art results in both applications. We test our algorithm on a robot for arranging objects in a new scene using the two applications aforementioned. We further extend the idea of hallucinating static human poses to anticipating human activities. We also present learning-based grasping and placing approaches for low-level manipulation tasks in complimentary to the high-level scene understanding tasks.
dc.identifier.other	bibid: 9333229
dc.identifier.uri	https://hdl.handle.net/1813/41178
dc.language.iso	en_US
dc.subject	Robotics
dc.subject	Machine learning
dc.subject	nonparametric learning
dc.title	Hallucinated Humans: Learning Latent Factors To Model 3D Environments
dc.type	dissertation or thesis
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Cornell University
thesis.degree.level	Doctor of Philosophy
thesis.degree.name	Ph. D., Computer Science

Files

Original bundle

Now showing 1 - 1 of 1

Name:: yj229.pdf
Size:: 35.25 MB
Format:: Adobe Portable Document Format

Download

Collections

Cornell Theses and Dissertations