A corpus search methodology for focus realization

dc.contributor.authorHowell, Jonathan
dc.contributor.authorRooth, Mats
dc.descriptionPoster presentation, 157th Meeting of the Acoustical Society of America. Abstract appears in J. Acoust. Soc. Am. Volume 125, Issue 4, pp. 2573-2573.en_US
dc.description.abstractWe describe a methodology for investigating the semantic-grammatical conditioning and phonetic realization of contrastive intonation using a web harvest of particular word strings followed by grammatical and acoustic analysis. A commercial audio web search engine using speech recognition retrieved 179 MP3 files purportedly containing a token of the string 'than I did.' In this comparative clause fragment, contrastive focus commonly falls on the subject 'she did more than I_F did' , on 'did', 'I wish I had done more than I did_F', or following 'I said more now than I did before_F' . The 96 true tokens of 'than I did' were classified into the categories 'subject', 'did', and 'following' by grammatical and semantic criteria. For each token, 5 segment intervals were hand-annotated and more than 300 acoustic parameters extracted using a Praat script. SVM machine learning classifiers were trained that identify focus classes by acoustic criteria. On a 10-fold crossvalidation test, the classifier achieves 90.2% accuracy in discriminating the dominant 'subject' and 'following' classes. In a listening task, human subjects achieved comparable accuracy of 90.3 given only the acoustic target 'than I did'. Stepwise logistic regression revealed measures of duration, f0, intensity, formants, and formant bandwidths among the significant factors.en_US
dc.subjectcontrastive intonationen_US
dc.subjectsupport vector machineen_US
dc.subjectweb harvesten_US
dc.titleA corpus search methodology for focus realizationen_US


Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
516.94 KB
Adobe Portable Document Format