Robustly Modeling The World From Photos

Other Titles
A camera is a device for compressing rich information about the visual appearance of the three-dimensional world into a two-dimensional image. This process is inherently lossy: given an image, we can make educated guesses about the world's shape and appearance, but there is not enough information for our guesses to be certain. However, if we take several pictures from different viewpoints we can reason more confidently. This thesis focuses on shape: How can we determine the geometry of the world from a collection of photos? This problem is classically called Structure from Motion. We find the structure (shape of the world) from camera motion (different viewpoints). Internet photo collections are an especially interesting source of data. With simple searches we can collect the raw information to reconstruct 3D models of famous world landmarks or entire cities. However, the photos we download are disorganized and noisy, and will not have been collected with 3D reconstruction in mind. While some impressive demonstrations of Structure from Motion systems exist, the next generation of solvers will need to be far more robust to the many types of difficulties encountered in the wild. To this end, many recent solvers pose the problem in a new way, using relative relationships between images to infer first the orientations, and then the positions of every camera in a scene. This framework promises faster runtime and greater robustness. I contribute a theoretical analysis of the difficulty of finding camera orientations, giving a way to decide which problems are tractable and which ones might be too hard and should be reformulated. I also propose a new solver with an accompanying outlier filter for finding camera positions. However, some of the hardest scenes are those which contain ambiguous structures: objects that look the same but are not. These induce self-consistent errors which are often too confusing for solvers to resolve correctly. I describe a scalable system which uses a graph-topological cue from a visibility graph to detect and remove these sources of error. Together these improvements work towards a robust solution to the Structure from Motion problem, so that we can reliably build 3D models of the world, even from noisy and confusing internet photo collections.
Journal / Series
Volume & Issue
Date Issued
Computer Vision; Structure from Motion; 3D Reconstruction
Effective Date
Expiration Date
Union Local
Number of Workers
Committee Chair
Snavely,Keith Noah
Committee Co-Chair
Committee Member
Bindel,David S.
Degree Discipline
Applied Mathematics
Degree Name
Ph. D., Applied Mathematics
Degree Level
Doctor of Philosophy
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
Link(s) to Reference(s)
Previously Published As
Government Document
Other Identifiers
Rights URI
dissertation or thesis
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record