Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Research Challenges and Opportunities for Open Generative Modeling

Research Challenges and Opportunities for Open Generative Modeling

File(s)
Gokaslan_cornellgrad_0058F_15170.pdf (29.47 MB)
Permanent Link(s)
https://doi.org/10.7298/5qjy-5972
https://hdl.handle.net/1813/120915
Collections
Cornell Theses and Dissertations
Author
Gokaslan, Aaron
Abstract

This dissertation develops methods to make generative modeling more accessible, reliable, and legally grounded across vision, biology, and language. I introduce CommonCanvas, an open latent diffusion pipeline trained solely on Creative-Commons-licensed images, using a lightweight “telephoning” caption synthesis step to reach competitive quality with vastly less data. I then present PlantCaduceus, a genomics foundation model that leverages domain structure to outperform much larger generic models on plant biology tasks. Finally, I apply a principled probabilistic framework to quantify memorization and the extraction of copyrighted training data from large language models, moving beyond coarse averages to surface per-work risk and inform safer curation and deployment. Together, these contributions fuse technical innovation with openness and risk-aware practice to lower barriers and challenge prevailing conventions in modern AI.

Description
180 pages
Date Issued
2025-08
Keywords
copyright
•
deep learning
•
diffusion models
•
generative ai
•
large language models
•
memorization
Committee Chair
Kuleshov, Volodymyr
Committee Member
Grimmelmann, James
Snavely, Keith
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Rights
Attribution-ShareAlike 4.0 International
Rights URI
https://creativecommons.org/licenses/by-sa/4.0/
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance