Meaning And Prosody: On The Web, In The Lab And From The Theorist'S Armchair

Other Titles


I present a new approach to research on meaning and prosody, using speech "harvested" from the web. I advocate a pluralistic view of linguistic data and methodology, within which web-harvested speech plays a vital role. I show that webharvested speech can be used effectively with computational and experimental methods on the one hand, and qualitative, impressionistic study on the other. My domain of inquiry is the well-known correlation between (i) which information in a discourse is most important (e.g. new or contrastive); and (ii) which material in an utterance is realized with prosodic prominence (e.g. stress, accent) which I refer to as "focus". In Chapter 2, I describe the method of harvesting speech data from the web, quantify its efficacy and discuss possible improvements. In Chapter 3, I investigate the location and acoustic realization of focus in comparative clauses (e.g. than I did). Using machine learning and human classifiers, I discover a robust correlation between particular acoustic cues of prosodic prominence and the location of focus predicted by linguistic theory. From the robustness of nonintonational acoustic cues, I hypothesis that focus may be realized by discrete, paradigmatic (i.e. cross-utterance) categories of stress. Results obtained from the web-harvested speech are cross-validated in a laboratory production experiment with stimuli modeled on the web data. Experimental results also confirm a distinct, but ambiguous prosodic realization of "second occurrence focus", which has been central to debates surrounding the semantics of focus. In Chapters 4 and 5, I investigate the adnominal emphatic reflexive (ER; e.g. himself in Jane met Chomsky himself). I argue that it is an instance of a theoretically predicted but poorly attested focus-sensitive operator having sub-propositional scope. Using constructed data and personal introspection, I argue that the adnominal ER exhibits the expected pragmatic, semantic, syntactic and prosodic properties of focus sensitive constructions, and I reconcile opposing approaches to its semantics. Finally, I debunk a deterministic view of focus, according to which certain linguistic constructions in a language are inherently or obligatorily focused, through the careful investigation of the intonation and discourse context of individual examples of the adnominal ER.

Journal / Series

Volume & Issue



NSF 1035151 RAPID: Harvesting Speech Datasets for Linguistic Research on the Web (Digging into Data Challenge)
SSHRC Digging into Data Challenge Grant 869-2009-0004 Project: Harvesting Speech Datasets for Linguistic Research on the Web

Date Issued





Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Rooth, Mats

Committee Co-Chair

Committee Member

Cohn, Abigail C
Wagner, Michael

Degree Discipline


Degree Name

Ph. D., Linguistics

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Rights URI


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record