eCommons

 

Capturing and Understanding Photos Autonomously

Other Titles

Abstract

This work considers autonomous photo capture, and improving visual understanding from large collections of photo. In terms of photo capture, we are interested in capturing aesthetically pleasing photos autonomously. The process of capturing a well-composed photo is difficult and it takes years of experience to master. We propose a novel pipeline for an autonomous agent to automatically capture an aesthetic photograph by navigating within a local region in a scene. Instead of classical optimization over heuristics such as the rule-of-thirds, we adopt a data-driven aesthetics estimator to assess photo quality. A reinforcement learning framework is used to optimize the model with respect to the learned aesthetics metric. We train our model in simulation with indoor scenes, and we demonstrate that our system can capture aesthetic photos in both simulation and real world environments on a ground robot. To our knowledge, it is the first system that can automatically explore an environment to capture an aesthetic photo with respect to a learned aesthetic estimator. While reinforcement learning works well in the existence of realistic simulated environments, a more versatile approach is to learn directly from collections of videos. Video clips are abundant and easy to obtain in any domain of interest, unlike 3D environment reconstructions that can be expensive and tedious to obtain. We discuss how autonomous photo capture can be done without relying on reinforcement learning in a way that allows us to train an autonomous photo capture model using videos. An orthogonal goal we consider in this work is improving visual understanding from large collections of photos. Specifically, we focus on the task of activity inference, where given a photo of a scene, we are interested in predicting the activities that could be performed at that scene or around it. Instead of using user collected annotations for this task, we construct a dataset by taking advantage of geotags from online photos. Specifically, we rely on the assumption that if there is a photo of a user engaging at an activity, then photos taken nearby can use that activity as one of its labels for the activity inference task. We propose a pipeline to construct an activity inference dataset and demonstrate an analysis on the constructed dataset and behaviors of models trained on that dataset. We show that our trained models attend to regions of the image that correspond to our intuitive understanding of the activity unlike baseline methods. We hope that autonomous photography will help users capture well-composed photos of their environments or assist users in the process. We expect that interest in photography and sharing photos will continue to rise, and that the photos users share can help improving the visual understanding of autonomous systems.

Journal / Series

Volume & Issue

Description

66 pages

Sponsorship

Date Issued

2022-08

Publisher

Keywords

activity inference; aesthetics; autonomous photography; photography

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Bala, Kavita

Committee Co-Chair

Committee Member

Acharya, Jayadev
Hariharan, Bharath

Degree Discipline

Computer Science

Degree Name

M.S., Computer Science

Degree Level

Master of Science

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record