Santa Barbara Corpus

From MLML
Jump to: navigation, search

To see all corpus holdings, click here.

The Santa Barbara Corpus is formed of spontaneous speech from across the United States, predominantly recorded during face-to-face interactions. The data were collected by the University of California - Santa Barbara Linguistics department and boasts about 249 000 words (parts 1-4). Time-stamped transcriptions are provided alongside the audio as part of the corpus development.

External Links