Robustly Modeling The World From Photos
A camera is a device for compressing rich information about the visual appearance of the three-dimensional world into a two-dimensional image. This process is inherently lossy: given an image, we can make educated guesses about the world's shape and appearance, but there is not enough information for our guesses to be certain. However, if we take several pictures from different viewpoints we can reason more confidently. This thesis focuses on shape: How can we determine the geometry of the world from a collection of photos? This problem is classically called Structure from Motion. We find the structure (shape of the world) from camera motion (different viewpoints). Internet photo collections are an especially interesting source of data. With simple searches we can collect the raw information to reconstruct 3D models of famous world landmarks or entire cities. However, the photos we download are disorganized and noisy, and will not have been collected with 3D reconstruction in mind. While some impressive demonstrations of Structure from Motion systems exist, the next generation of solvers will need to be far more robust to the many types of difficulties encountered in the wild. To this end, many recent solvers pose the problem in a new way, using relative relationships between images to infer first the orientations, and then the positions of every camera in a scene. This framework promises faster runtime and greater robustness. I contribute a theoretical analysis of the difficulty of finding camera orientations, giving a way to decide which problems are tractable and which ones might be too hard and should be reformulated. I also propose a new solver with an accompanying outlier filter for finding camera positions. However, some of the hardest scenes are those which contain ambiguous structures: objects that look the same but are not. These induce self-consistent errors which are often too confusing for solvers to resolve correctly. I describe a scalable system which uses a graph-topological cue from a visibility graph to detect and remove these sources of error. Together these improvements work towards a robust solution to the Structure from Motion problem, so that we can reliably build 3D models of the world, even from noisy and confusing internet photo collections.
Computer Vision; Structure from Motion; 3D Reconstruction
Connelly,Robert; Bindel,David S.; Bala,Kavita
Ph. D., Applied Mathematics
Doctor of Philosophy
dissertation or thesis