To see all corpus holdings, click here.
Mainly collected in 2002-2003, the Fisher Corpus offers 2 742 hours of speech in the form of 16 454 telephone calls lasting on average about 10 minutes each. The speakers are fairly balanced for gender at 53% women and 47% men. In addition, 38% of the speakers were 16-29 years of age, 45% were 30-49 and 17% were at least 50 years older, meaning there is some variability in ages. See Cieri et al. (2004) for regional variability, with the project seeking to represent the main American varieties of English. Furthermore, Canadians and other non-American speakers as well as non-native speakers were also included. The data were transcribed manually within the context of the DARPA EARS program of which the corpus is part.
- Cieri, Christopher, David Miller, and Kevin Walker. 2004. "The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text". Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC) in Lisbon, Portugal: 69-71.