Robustly Modeling The World From Photos

Other Titles


A camera is a device for compressing rich information about the visual appearance of the three-dimensional world into a two-dimensional image. This process is inherently lossy: given an image, we can make educated guesses about the world's shape and appearance, but there is not enough information for our guesses to be certain. However, if we take several pictures from different viewpoints we can reason more confidently. This thesis focuses on shape: How can we determine the geometry of the world from a collection of photos? This problem is classically called Structure from Motion. We find the structure (shape of the world) from camera motion (different viewpoints). Internet photo collections are an especially interesting source of data. With simple searches we can collect the raw information to reconstruct 3D models of famous world landmarks or entire cities. However, the photos we download are disorganized and noisy, and will not have been collected with 3D reconstruction in mind. While some impressive demonstrations of Structure from Motion systems exist, the next generation of solvers will need to be far more robust to the many types of difficulties encountered in the wild. To this end, many recent solvers pose the problem in a new way, using relative relationships between images to infer first the orientations, and then the positions of every camera in a scene. This framework promises faster runtime and greater robustness. I contribute a theoretical analysis of the difficulty of finding camera orientations, giving a way to decide which problems are tractable and which ones might be too hard and should be reformulated. I also propose a new solver with an accompanying outlier filter for finding camera positions. However, some of the hardest scenes are those which contain ambiguous structures: objects that look the same but are not. These induce self-consistent errors which are often too confusing for solvers to resolve correctly. I describe a scalable system which uses a graph-topological cue from a visibility graph to detect and remove these sources of error. Together these improvements work towards a robust solution to the Structure from Motion problem, so that we can reliably build 3D models of the world, even from noisy and confusing internet photo collections.

Journal / Series

Volume & Issue



Date Issued




Computer Vision; Structure from Motion; 3D Reconstruction


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Snavely,Keith Noah

Committee Co-Chair

Committee Member

Bindel,David S.

Degree Discipline

Applied Mathematics

Degree Name

Ph. D., Applied Mathematics

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Rights URI


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record