Analyze Script Example

Revision as of 21:12, 16 May 2017 by SMihuc (Talk | contribs)

Jump to: navigation, search

This example shows how to use the analyze_script function of PolyglotDB on a corpus. This function takes a query and a Praat script as input, and executes the Praat script on the phones resulting from the query. It saves the result of the Praat script for each phone into the database, under a property name also specified by the user. The analyze_script function allows users to perform any type of acoustic analysis/enrichment on a subset of the phones in the database; all the user needs to do is write a Praat script to do the acoustic analysis they want.

In this example, we will calculate center of gravity for all occurrences of the sibilant 'ZH' in a subset of the Librispeech corpus, and export the results to a CSV file. This example was run on a subset of the Librispeech corpus, called librispeech_medium. It takes about 1-2 hours to run (on librispeech_medium).

Requirements & Setup

This tutorial assumes that you already have:

  1. PolyglotDB installed
  2. librispeech_medium downloaded and imported into PolyglotDB
  3. If on Windows, you will need to download Praatcon.exe, which can be found here, if you do not already have it.

If not, there are instructions on how to do this [somewhere??]. (For now, the current instructions to install PolyglotDB are here, up to line 12 or on the PolyglotDB Github page.) You can also run this example on any other corpus that you have imported into PolyglotDB (with some modification of the corpus configuration and/or queries).

The code to run this example is located at examples/ inside of the PolyglotDB folder. The Praat script used in this example is in the same folder, and is named COG.praat. You can change directory to PolyglotDB/examples in the terminal now.

Praat Script

How the Praat script works:

The COG (center of gravity) Praat script takes as input the path to a sound file, the beginning time and end time of the phone to be analyzed, and two other arguments needed for extracting center of gravity using Praat. This script will be run once on each phone in the database matching the query given to analyze_script.

Any Praat script that you use as input to analyze_script must have, as its first three inputs, the following:

  • the path to the sound file containing the phone
  • the time of the beginning of the phone
  • the time of the end of the phone

The rest of the code in COG.praat can serve as a template for writing other Praat scripts to be used as input to analyze_script: it reads the file, creates a smaller file containing only the phone of interest, then does some acoustic analysis, and prints the result.

Any Praat script used as input to analyze_script must also echo the resulting acoustic measurement or other property to the Praat Info window. Only the acoustic measurement should be printed, nothing else. At this time, only Praat scripts which output only one measurement are supported.

Python Script

Now let's run

1. The first part of this script contains the paths to various files needed to run the script, and the configuration of the corpus to be used, change these to set up the script for your computer. These are:

  • praat_path: should be the path to a command line Praat program (either Praat on Mac/Linux or Praatcon on Windows: see above for help downloading Praatcon).
  • script_path: the full path to the COG script. This should be the path to PolyglotDB/examples/COG.praat.
  • output_path: the full path and filename where the output CSV file from this example will be saved.

2. Once you have set up the correct paths for your computer, to run the example script, do the following in the terminal:

  • change directory to PolyglotDB/examples
  • type python (and hit enter) to run the script

Note that this will take 1 to 2 hours to run.

How the Python script works:

Within the main method, there is a query. The query in this example gets all instances of 'ZH' in the corpus. This can be replaced with any query over phones. The query is used as input to analyze_script to determine which phones to analyze.

To call analyze_script, the following arguments are needed:

  • query object: see here for more info
  • path to the Praat script to run
  • name of the resulting measurement (in this case, 'COG' for center of gravity)
  • call_back: use None if you don't want to use this
  • stop_check: use None if you don't want to use this
  • any number of additional arguments, all of which will be passed to the Praat script (in this case, '1' and '2')

Querying & Exporting the Results

Once analyze_script has been run, all of the 'ZH' phones in the database will now have a COG property that was extracted using the Praat script. This property can be accessed in PolyglotDB queries as

The rest of the code in exports some data about the phones we encoded COG for, including their COG measurements, to a CSV file called cog_data.csv.

Following the same method, you can use analyze_script to add any acoustic measure to the database that you want, provided you have a Praat script which extracts it.