Difference between revisions of "Analyze Script Example"

Jump to: navigation, search
(changed example to reflect changes in input to analyze_script)
Line 52: Line 52:
You can encode a class by calling <code>encode_class</code> on your corpus context. The <code>encode_class</code> allows you to specify a set of phones and a label for them. This label is used as the input for <code>analyze_script</code> to tell which phones to analyze.
You can encode a class by calling <code>encode_class</code> on your corpus context. The <code>encode_class</code> function allows you to specify a set of phones and a label for them. This label is used as the input for <code>analyze_script</code> to tell which phones to analyze.
===Querying & Exporting the Results===
===Querying & Exporting the Results===

Revision as of 18:04, 24 May 2017

This example shows how to use the analyze_script function of PolyglotDB on a corpus. This function takes a query and a Praat script as input, and executes the Praat script on the phones resulting from the query. It saves the result of the Praat script for each phone into the database, under a property name also specified by the user. The analyze_script function allows users to perform any type of acoustic analysis/enrichment on a subset of the phones in the database; all the user needs to do is write a Praat script to do the acoustic analysis they want.

In this example, we will calculate center of gravity for all occurrences of sibilants in the Librispeech corpus, and export the results to a CSV file. It takes about 2 hours to run.

Requirements & Setup

This tutorial assumes that you already have:

  1. PolyglotDB installed
  2. Librispeech or librispeech_medium downloaded and imported into PolyglotDB
  3. If on Windows, you will need to download Praatcon.exe, which can be found here, if you do not already have it.

If not, there are instructions on how to do this [somewhere??]. (For now, the current instructions to install PolyglotDB are here, up to line 12.) You can also run this example on any other corpus that you have imported into PolyglotDB (with some modification of the corpus configuration and/or queries).

The code to run this example is located at examples/analyze_script_example.py inside of the PolyglotDB folder. The Praat script used in this example is in the same folder, and is named COG.praat. You can change directory to PolyglotDB/examples in the terminal now.

Praat Script

How the Praat script works:

The COG (center of gravity) Praat script takes as input the path to a sound fileto be analyzed. The other parameters needed to extract COG are set at the beginning of the the script. This code will be run once on each phone in the database matching the query given to analyze_script.

Any Praat script that you use as input to analyze_script must do the following:

  • have a form with exactly one input: the full path to the sound file containing the phone. (Any other parameters can be set manually within your script.)
  • echo the resulting acoustic measurement or other property to the Praat Info window. (Only the acoustic measurement should be printed, nothing else. Currently, only Praat scripts which output one measurement are supported.)

The code in COG.praat can serve as a template for writing other Praat scripts to be used as input to analyze_script: it reads the file, does some acoustic analysis, and prints the result.

Python Script

Now let's run analyze_script_example.py.

1. The first part of this script contains the paths to various files needed to run the script, and the configuration of the corpus to be used, change these to set up the script for your computer. These are:

  • praat_path: should be the path to a command line Praat program (either Praat on Mac/Linux or Praatcon on Windows: see above for help downloading Praatcon).
  • script_path: the full path to the COG script. This should be the path to PolyglotDB/examples/COG.praat.
  • output_path: the full path and filename where the output CSV file from this example will be saved.

2. Once you have set up the correct paths for your computer, to run the example script, do the following in the terminal:

  • change directory to PolyglotDB/examples
  • type python analyze_script_example.py (and hit enter) to run the script. Note that this will take about 2 hours to run.

How the Python script works:

To call analyze_script, the following arguments are needed:

  • name of a phone class (which has already been encoded using encode_class, in this case 'sibilant')
  • path to the Praat script to run
  • name of the resulting measurement (in this case, 'COG' for center of gravity)
  • optional arguments: stop_check and call_back

You can encode a class by calling encode_class on your corpus context. The encode_class function allows you to specify a set of phones and a label for them. This label is used as the input for analyze_script to tell which phones to analyze.

Querying & Exporting the Results

Once analyze_script has been run, all the sibilant phones in the database will now have a COG property that was extracted using the Praat script. This property can be accessed in PolyglotDB queries as g.phone.COG.

The rest of the code in analyze_script_example.py exports some data about the phones we encoded COG for, including their COG measurements, to a CSV file called cog_data.csv.

Following the same method, you can use analyze_script to add any acoustic measure to the database that you want, provided you have a Praat script which extracts it.