A web application for filtering and annotating web speech data
Loading...
No Access Until
Permanent Link(s)
Other Titles
Authors
Abstract
A vast and growing amount of recorded
speech is freely available on the web, including
podcasts, radio broadcasts, and
posts on media-sharing sites. However,
finding specific words or phrases in online
speech data remains a challenge for researchers,
not least because transcripts of
this data are often automatically-generated
and imperfect. We have developed a web
application, “ezra”, that addresses this
challenge by allowing non-expert and potentially
remote annotators to filter and annotate
speech data collected from the web
and produce large, high-quality data sets
suitable for speech research. We have
used this application to filter and annotate
thousands of speech tokens. Ezra is
freely available on GitHub1, and development
continues.
Journal / Series
Volume & Issue
Description
Sponsorship
NSF 1035151 RAPID: Harvesting
Speech Datasets for Linguistic Research on the
Web (Digging into Data Challenge)
Date Issued
2013-07-22
Publisher
Special Interest Group of the Association for Computational Linguistics on Web as Corpus (ACL SIGWAC)
Keywords
corpus; speech; web interface; annotation; filtering; prosody
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Committee Co-Chair
Committee Member
Degree Discipline
Degree Name
Degree Level
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
References
Link(s) to Reference(s)
Previously Published As
Stefan Evert, Egon Stemle and Paul Rayson (editors). Proceedings of the 8th Web as Corpus Workshop. July, 2013.
Government Document
ISBN
ISMN
ISSN
Other Identifiers
Rights
Rights URI
Types
article