Analyze Script Example
This example shows how to use the analyze_script function on a corpus. This function takes a query and a Praat script as input, and executes the Praat script on the phones resulting from the query. It saves the result of the Praat script for each phone into the database, under a property name also specified by the user. The analyze_script function allows users to perform any type of acoustic analysis/enrichment on a subset of the phones in the database; all the user needs to do is write a Praat script to do the acoustic analysis they want.
In this example, we will calculate center of gravity for all occurrences of the sibilant [ZH] in a subset of the Librispeech corpus. This example was run on a subset of Librispeech, librispeech_medium. It takes about 1-2 hours to run (on librispeech_medium).
This tutorial assumes that you already have PolyglotDB installed, and have librispeech_medium downloaded and imported into PolyglotDB. If not, there are instructions on how to do this [somewhere????]. You can also run this example on any other corpus that you have imported into PolyglotDB (with some modification of the corpus configuration and/or queries).
The code to run this example is located at examples/analyze_script_example.py inside of the PolyglotDB folder. The Praat script used in this example is in the same folder, and is named COG.praat.
The COG praat script takes as input the path to a sound file, the beginning time and end time of the phone to be analyzed, and two other arguments needed for extracting center of gravity using Praat. This script will be run once on each phone in the database matching the query given to analyze_script.
Any Praat script that you use as input to analyze_script must have, as its first three inputs, the following:
- the path to the sound file containing the phone
- the time of the beginning of the phone
- the time of the end of the phone
The rest of the code in COG.praat can serve as a template for writing other Praat scripts to be used as input to analyze_script: it reads the file, creates a smaller file containing only the phone of interest, then does some acoustic analysis, and prints the result.
Any Praat script used as input to analyze_script must also echo the resulting acoustic measurement or other property to the Praat Info window. Only the acoustic measurement should be printed, nothing else. At this time, only Praat scripts which output only one measurement are supported.
Now let's run analyze_script_example.py. The first part of this script contains the paths to various files needed to run the script, and the configuration of the corpus to be used: change these to set up the script for your computer. praat_path should be the path to a command line Praat program. (On Windows, this currently works only with Praatcon.exe, which can be downloaded from here and is not part of very recent releases of Praat).
Once you have set up the correct paths for your computer, to run the example script, type into the terminal: python analyze_script_example.py (Note that this will take 1 to 2 hours to run.)
Within the main method, there is a query. The query in this example gets all instances of [ZH] in the corpus. This can be replaced with any query over phones. The query is used as input to analyze_script to determine which phones to analyze.
To call analyze_script, the following arguments are needed:
- query object
- path to the Praat script to run
- name of the resulting measurement (in this case, 'COG' for center of gravity)
- call_back: use None if you don't want to use this
- stop_check: use None if you don't want to use this
- any number of additional arguments, all of which will be passed to the Praat script (in this case, '1' and '2')
Once analyze_script has been run, all of the [ZH] phones in the database will now have a COG property that was extracted using the Praat script. This property can be accessed as g.phone.COG.
The rest of the code in analyze_script_example.py exports some data about these phones, including their COG measurements, to a CSV file called cog_data.csv.
Following the same method, you can use analyze_script to add any acoustic measure to the database that you want, provided you have a Praat script which extracts it.