Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Evaluating language models applied to student thinking about experiments

Evaluating language models applied to student thinking about experiments

File(s)
Fussell_cornellgrad_0058F_15230.pdf (3.32 MB)
Permanent Link(s)
https://doi.org/10.7298/7cqa-r637
https://hdl.handle.net/1813/120819
Collections
Cornell Theses and Dissertations
Author
Fussell, Rebeckah
Abstract

Recent advancements in natural language processing (NLP) have enabled education researchers to analyze text-based open-response data efficiently and at scale. Here we investigate the use of these methods for a variety of education research questions across multiple data sets. First, we compare the application of various language models to perform large-scale identification of experimental skills in students' typed lab notes through sentence-level labeling. We find that fine-tuned higher-resource models often perform better than fine-tuned lower-resource models, but few-shot implementations of higher-resource models do not perform better. Second, we investigate methods to assess the trustworthiness of education claims made using machine-coded data. We propose a four-part method for making such claims with supervised natural language processing, grounded in physics experimental lab practices, including quantification of uncertainty to calibrate measurements. We provide evidence for this method using data from two distinct short response survey questions with two distinct coding schemes, and work through a real-world example of using these practices to machine code a data set unseen by human coders. We then implement both a supervised analysis and an unsupervised analysis to analyze data from a survey that assesses student thinking about measurement in both quantum and classical contexts. In the supervised portion, we train models to apply a single consistent coding scheme across five open-ended questions on this survey, and we build on results from the previous chapter by demonstrating a second method to calibrate measurements. In the unsupervised portion, we perform a cluster analysis on student responses to a question that assesses reasoning about sources of uncertainty in four experimental physics scenarios. We conclude our investigations into methods to automate analysis of open-response questions with a study that does not apply natural language processing methods at all. Rather, this study evaluates closed-response versions of questions that assess students' reasoning compared to open-response versions. We conduct an experiment to test if asking students their reasoning in an open-response version of the question changes their answer to a subsequent closed-response version of the question, and we do not find an effect. In the same experiment, we investigate whether we measure students' experimental reasoning differently on the closed-response version compared to the open-response version. We discuss the differences we find between the two formats, suggesting the open-response and closed-response versions of the questions measure different aspects of student thinking. Together, these analyses form a foundation for education researchers to evaluate a range of both established and emerging methods for conducting research at speed and scale. Simultaneously, these analyses expand the toolkit available to education researchers to assess student thinking about experiments.

Description
230 pages
Date Issued
2025-08
Keywords
large language models
•
natural language processing
•
physics education research
•
responsible AI
•
STEM education research
Committee Chair
Holmes, Natasha
Committee Member
Elser, Veit
Thom-Levy, Julia
Degree Discipline
Physics
Degree Name
Ph. D., Physics
Degree Level
Doctor of Philosophy
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance