Analyze Script Example
This example shows how to use the
analyze_script function of PolyglotDB on a corpus. This function takes a phone class and a Praat script as input, and executes the Praat script on the phones that are in the specified phone class. It saves the result of the Praat script for each phone into the database, under a property name also specified by the user. The
analyze_script function allows users to perform any type of acoustic analysis/enrichment on a subset of the phones in the database; all the user needs to do is write a Praat script to do the acoustic analysis they want.
In this example, we will calculate center of gravity for all occurrences of sibilants in the Librispeech corpus, and export the results to a CSV file. It takes about 2 hours to run.
Requirements & Setup
This tutorial assumes that you already have:
- PolyglotDB installed
- Librispeech or librispeech_medium downloaded and imported into PolyglotDB
- If on Windows, you will need to download
Praatcon.exe, which can be found here, if you do not already have it.
If not, there are instructions on how to do this [somewhere??]. (For now, the current instructions to install PolyglotDB are here, up to line 12.) You can also run this example on any other corpus that you have imported into PolyglotDB (with some modification of the corpus configuration and/or queries).
The code to run this example is located at
examples/analyze_script_example.py inside of the PolyglotDB folder. The Praat script used in this example is in the same folder, and is named
COG.praat. You can change directory to
PolyglotDB/examples in the terminal now.
How the Praat script works:
The COG (center of gravity) Praat script takes as input the path to a sound fileto be analyzed. The other parameters needed to extract COG are set at the beginning of the the script. This code will be run once on each phone in the database matching the query given to
Any Praat script that you use as input to
analyze_scriptmust do the following:
- have a
formwith exactly one input: the full path to the sound file containing the phone. (Any other parameters can be set manually within your script.)
echothe resulting acoustic measurement or other property to the Praat Info window. (Only the acoustic measurement should be printed, nothing else. Currently, only Praat scripts which output one measurement are supported.)
The code in
COG.praatcan serve as a template for writing other Praat scripts to be used as input to
analyze_script: it reads the file, does some acoustic analysis, and prints the result.
Now let's run
1. The first part of this script contains the paths to various files needed to run the script, and the configuration of the corpus to be used, change these to set up the script for your computer. These are:
praat_path: should be the path to a command line Praat program (either Praat on Mac/Linux or Praatcon on Windows: see above for help downloading Praatcon).
script_path: the full path to the COG script. This should be the path to
output_path: the full path and filename where the output CSV file from this example will be saved.
2. Once you have set up the correct paths for your computer, to run the example script, do the following in the terminal:
- change directory to
python analyze_script_example.py(and hit enter) to run the script. Note that this will take about 2 hours to run.
How the Python script works:
analyze_script, the following arguments are needed:
- name of a phone class (which has already been encoded using
encode_class, in this case
- path to the Praat script to run
- name of the resulting measurement (in this case,
'COG'for center of gravity)
- optional arguments: stop_check and call_back
analyze_scripton a new subset of phones (i.e. a phone class), you must encode the phone class. You can encode a class by calling
encode_classon your corpus context. The
encode_classfunction allows you to specify a set of phones and a label for that phone class. This label is used to tell
analyze_scriptwhich phones to analyze. In this example, we analyze the class
'sibilant', which is encoded as all the phones in
Querying & Exporting the Results
analyze_script has been run, all the sibilant phones in the database will now have a COG property that was extracted using the Praat script. This property can be accessed in PolyglotDB queries as
The rest of the code in
analyze_script_example.py exports some data about the phones we encoded COG for, including their COG measurements, to a CSV file called
Following the same method, you can use
analyze_script to add any acoustic measure to the database that you want, provided you have a Praat script which extracts it.
Checking the Results in R
You can type the following in R to see boxplots of COG for each sibilant.
library(ggplot2) cogs <- read.csv("replace/with/path/to/cog_data.csv") ggplot(cogs, aes(x=phone_label, y=COG)) + geom_boxplot()