Memory in Language Models: Representation and Extraction
Permanent Link(s)
Collections
Author
Morris, John
Abstract
We explore memory in neural language models as it is stored in model weights after training and in model activations ("embeddings") during inference. We first describe a method for improving embeddings by incorporating surrounding documents in the context of text retrieval. We then propose a new method for inverting text embeddings and demonstrate its applicability to the outputs of general language models. Finally, we measurement of the information content of model weights, a way to characterize the total amount of information models can store.
Description
138 pages
Date Issued
2025-12
Keywords
Committee Chair
Rush, Alexander
Committee Member
Zabih, Ramin
Pierson, Emma
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Type
dissertation or thesis