Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Improving the Quality, Efficiency and Understanding of Generative Models

Improving the Quality, Efficiency and Understanding of Generative Models

Access Restricted

Access to this document is restricted. Some items have been embargoed at the request of the author, but will be made publicly available after the "No Access Until" date.

During the embargo period, you may request access to the item by clicking the link to the restricted file(s) and completing the request form. If we have contact information for a Cornell author, we will contact the author and request permission to provide access. If we do not have contact information for a Cornell author, or the author denies or does not respond to our inquiry, we will not be able to provide access. For more information, review our policies for restricted content.

File(s)
Gu_cornellgrad_0058F_15360.pdf (139.6 MB)
No Access Until
2028-01-08
Permanent Link(s)
https://doi.org/10.7298/48dc-k889
https://hdl.handle.net/1813/121140
Collections
Cornell Theses and Dissertations
Author
Gu, Zeqi
Abstract

Visual generation lies at the intersection of perception and imagination. Humans can effortlessly create, edit, and reason about complex visual scenes -- an ability that enables storytelling, communication, and design. Building artificial systems with similar generative capacities that connects understanding with creativity is a central challenge in computer vision and graphics. Such capability also underpins numerous applications in digital media, virtual reality, and content creation. In recent years, generative models have made remarkable progress toward this goal, producing high-fidelity images, animations, and 3D scenes, etc. Despite their success, existing methods still face key challenges in controllability, computational efficiency, and reasoning intelligence. This dissertation explores how to improve the quality, efficiency, and understanding of generative models across diverse visual modalities. The first part of this work addresses visual understanding through factorized video generation. FactorMatte introduces a counterfactual formulation of video matting that decomposes scenes into physically meaningful layers, enabling realistic re-composition and editing. The second part investigates improving the quality and efficiency of image generation for diffusion-based approaches. Filter-Guided Diffusion develops a training-free, architecture-independent guidance method that accelerates sampling while preserving fidelity. AniDiffusion enhances the controllability of pose-conditioned animation generation by learning automatic rigging from minimal examples, and ArtiScene advances language-driven 3D scene generation by leveraging 2D diffusion intermediaries to achieve stylistic consistency and diverse layouts without additional training. The final part focuses on improving reasoning capabilities and efficiency of autoregressive multimodal models. ShortCoTI introduces a RLHF (Reinforcement Learning from Human Feedback)-based optimization that reduces redundant reasoning steps in autoregressive image generation while maintaining or improving output quality. Together, these contributions advance the development of generative systems that are high-quality, computationally efficient, and capable of structured understanding across modalities such as images, videos, and 3D scenes.

Description
227 pages
Date Issued
2025-12
Keywords
Computer Graphics
•
Computer Vision
•
Generative Models
•
Multimodal Models
Committee Chair
Davis, Myers
Committee Member
Snavely, Keith
Estrin, Deborah
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
https://creativecommons.org/licenses/by/4.0/
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance