A Latent Space Analysis of the Sketch-to-Model Problem
No Access Until
Permanent Link(s)
Collections
Other Titles
Author(s)
Abstract
Despite the accomplishments made in the field of computer science, computers still struggle with many tasks that are relatively simple for humans. One such task is to interpret the approximate three-dimensional structure of an object from a ‘sketch,’ which here is defined as a grayscale two-dimensional line drawing. Algorithmic approaches where the rules for interpretation are explicitly defined by the programmer have seen some success, but these methods have a limited ability to account for the many nuances of human perception that influence our interpretation of 3D structure. More recent approaches for solving the related problem of interpreting three-dimensional structure from two-dimensional digital photographs rely on deep learning techniques and large datasets. These approaches have remarkable performance, but many are plagued by artifacts and limitations inherent to the format of the output representation. Research on generating the three-dimensional structure with implicit representations have promise in that they avoid the limitations of the other formats, and the goal of this thesis is to generate these implicit representations from sketches instead of digital photographs. This thesis proposes a two-step process for constructing three-dimensional geometry from sketches. First, the sketch is encoded into a sketch-model latent space, and second, the sketch’s latent space representation is used to generate an approximate signed distance field of the object. The first step was accomplished by training a convolutional variational autoencoder on a set of synthetically generated line-drawings, and the number of dimensions chosen for the latent space was 1024. Principal Component Analysis was used to get a rough estimate of the necessary number of latent dimensions, and the appropriate dimensionality was confirmed with measurements of variational autoencoder’s test set performance using varying numbers of latent dimensions. The second step was accomplished with an eight-layer network modeled after the architecture presented in [Park19], with modifications to account for the larger dimensionality of the input latent codes. To render the object, this second network was queried repeatedly using a modified sphere tracing algorithm to identify the surface, and an arbitrary directional light source and simple Lambertian shading were used to color the surface. While the networks performed well considering the naïve approach to generating the latent space and the vast increase in the latent space dimensions compared to [Park19], the quality was insufficient for accurately reconstructing sketch geometries beyond a rough approximation of their convex hull. The low-quality results can be explained by one or more of the following limitations with the approach used in this thesis: a poorly behaved sketch-model latent space, an insufficient amount of training data, suboptimal design choices for the network architectures, and suboptimal methods for training. Possible ways to address these limitations are presented as potential future work.
Journal / Series
Volume & Issue
Description
Supplemental file(s) description: An animated image of one of the reconstructions. This animation was made by defining an orbital path around the object for the camera and rendering the object at 22.5 degree steps along the path., All of the renders used in various figures of the thesis with their original colors. As stated in the relevant figures' captions, the renders colors were modified to improve the quality of printed copies of the work..