Jump to: navigation, search

To see all corpus holdings, click here.

The JASMIN-spraakcorpus (or JASMIN Speech Corpus) is a collection of about 115 hours of Dutch. The speakers are citizens of and foreigners living in Flanders (about a third of recorded speakers) and in the Netherlands (about two thirds of recorded speakers). Native speakers are divided into three age groups (7-11, 12-16, over 65) and non-native speakers are divided into two age groups (7-17 and 18-60), with non-native speakers being tested for proficiency. The speakers are roughly gender-balanced in all age and nativeness groups. The corpus is formed of read speech and speech from human-machine dialogue.

External Link


Cucchiarini, C., J. Driesen, H. Van hamme, and E. Sanders. 2008. "Recording Speech of Children, Non-Natives and Elderly People for HLT Applications: the JASMIN-CGN Corpus". Proceedings of LREC 2008 in Marrakech, Morroco on May 26 - June 1, 2008.