Switchboard

From MLML
Jump to: navigation, search

Switchboard

Dataset location

Switchboard was already on the MLML server

Audio

Switchboard audio was in .sph files. A script was written to automate conversion using SoX.

Alignment

Switchboard is already aligned at the segment and word level, but in xml format. A conversion script was written to parse the per-speaker xml files into a single .textgrid file.

Importer

Since the files were converted to .textgrid files, an importer exists in PolyglotDB

Importing

The corpus was imported into PolygotDB using the FAVE importer. Note that due to the size of the corpus, the heap size for Neo4j had to be increased. Basic queries were run to ensure that the corpus had been imported properly.