A corpus search methodology for focus realization

Howell, JonathanRooth, Mats2009-07-052009-07-052009-07-05https://hdl.handle.net/1813/13093Poster presentation, 157th Meeting of the Acoustical Society of America. Abstract appears in J. Acoust. Soc. Am. Volume 125, Issue 4, pp. 2573-2573.We describe a methodology for investigating the semantic-grammatical conditioning and phonetic realization of contrastive intonation using a web harvest of particular word strings followed by grammatical and acoustic analysis. A commercial audio web search engine using speech recognition retrieved 179 MP3 files purportedly containing a token of the string 'than I did.' In this comparative clause fragment, contrastive focus commonly falls on the subject 'she did more than I_F did' , on 'did', 'I wish I had done more than I did_F', or following 'I said more now than I did before_F' . The 96 true tokens of 'than I did' were classified into the categories 'subject', 'did', and 'following' by grammatical and semantic criteria. For each token, 5 segment intervals were hand-annotated and more than 300 acoustic parameters extracted using a Praat script. SVM machine learning classifiers were trained that identify focus classes by acoustic criteria. On a 10-fold crossvalidation test, the classifier achieves 90.2% accuracy in discriminating the dominant 'subject' and 'following' classes. In a listening task, human subjects achieved comparable accuracy of 90.3 given only the acoustic target 'than I did'. Stepwise logistic regression revealed measures of duration, f0, intensity, formants, and formant bandwidths among the significant factors.prosodyfocuscontrastive intonationcomparativephoneticssupport vector machineweb harvestA corpus search methodology for focus realizationpresentation