Sounds of the City

From MLML
Jump to: navigation, search

Sounds of the City is a corpus of 142 speakers of the Glaswegian English vernacular over a span of around 100 years. It is interested in examining the changes in Glaswegian over time. This page describes the steps to treat this corpus so that it may be used with the Montreal Forced Aligner and imported for the SPADE project.

Get Dataset

This data can be found on Havarti at corpora/SOTC.

Treating Audio

None needed.

Treating Transcripts

None needed.

Dictionary

The dictionary being used for this task is CELEX, on Havarti at corpora/CELEX.

Alignment

Already aligned.

Import

The following files are missing a corresponding transcript, and thus should be left out of the directory to be imported:

 70-Y-m07-labo.wav
 00-O-m01-mw74.wav
 00-Y-m03-medi.wav
 00-Y-m04-medi.wav
 70-mc-M-m02-mlay.wav
 70-mc-M-m03-mlay.wav
 80-O-f01-clbk.wav
 80-O-f02-clbk.wav
 80-O-f03-clbk.wav
 80-O-m01-clbk.wav
 80-O-m02-clbk.wav
 80-O-m03-clbk.wav
 80-O-m04-clbk.wav
 80-O-m05-clbk.wav
 80-O-m06-clbk.wav

The corpus can then be imported using the Labbcat parser.

This corpus is imported in PolyglotDB under the name SOTC.