Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. “NOTHING TOO SERIOUS”: CORPUS RESOURCES AND METHODS FOR DATA–DRIVEN APPROACHES TO POLARITY SENSITIVITY

“NOTHING TOO SERIOUS”: CORPUS RESOURCES AND METHODS FOR DATA–DRIVEN APPROACHES TO POLARITY SENSITIVITY

File(s)
Hummel_cornellgrad_0058F_14988.pdf (3.21 MB)
Permanent Link(s)
https://doi.org/10.7298/dsdc-9e03
https://hdl.handle.net/1813/117573
Collections
Cornell Theses and Dissertations
Author
Hummel, Andrea
Abstract

This dissertation introduces the Polar Bigrams Resource (PBR), a large-scale corpus-based dataset designed to support data-driven investigations of polarity sensitivity. To address a major challenge for bottom-up processing—the lack of overt indicators of polarity environments—this work employs polarity approximations, paired with aggressive post-processing enabled by the statistical power of large-scale data. A recurring theme of asymmetry emerges, seen in the empirical polarity landscape (positive polarity dominates negative 22:1 in PBR) and in the theoretical structure of polarity licensing (polarity-sensitive items require licensing, but not vice versa). These asymmetries motivate the use of (backwards) asymmetric association measures to quantify the polarity sensitivity of lexical units. The 72 million bigram tokens of PBR are drawn from dependency-parsed corpora, including the novel corpus, Puddin, derived from a portion of The Pile. Contributing roughly 1.4 billion words, Puddin is largely responsible for PBR's empirical scale. Bigram polarity is approximated by defining positive polarity as either absence of a negative or presence of a positive. A secondary sampling method balances polarity sizes by down-sampling positive samples. These methods define four comparison spaces, offering flexibility in analytical design. To support transparency and replication, detailed descriptions of data-processing procedures are provided for both Puddin and PBR. A detailed case study of the adverb exactly illustrates the practical application of PBR. It provides a fine-grained analysis of exactly across comparison spaces and relevant lexical units—exactly alone, bigrams containing exactly, and the most relevant adjectives considered independently—and includes PBR-sourced examples to elucidate the quantitative findings. The patterns uncovered suggest a preliminary theory in which pre-adjectival exactly is contingently polarity sensitive, depending on the scalar semantics of its adjective argument. Together, these methodological contributions and findings offer a new empirical foundation for polarity research and demonstrate the power—and limits—of data-driven approaches to evaluating complex semantic and pragmatic phenomena.

Description
495 pages
Date Issued
2025-05
Keywords
adverb adjective bigrams
•
computational methods
•
large-scale corpus data
•
polarity sensitivity
•
pre-adjectival exactly
•
scalar semantics
Committee Chair
Rooth, Mats
Committee Member
Murray, Sarah
Abusch, Dorit
Degree Discipline
Linguistics
Degree Name
Ph. D., Linguistics
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
https://creativecommons.org/licenses/by/4.0/
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/16938374

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance