What is FAVE-align?

FAVE (Forced Alignment & Vowel Extraction) is a set of two programs: FAVE-align and FAVE-extract.

FAVE-align, based on the Penn Phonetics Lab Forced Aligner (P2FA), is a forced alignment program adapted for sociolinguistic interviews or other texts with multiple speakers. It accepts as input a sound file with its corresponding orthographic transcript, and returns a Praat TextGrid file with two tiers per speaker, a phone tier and a word tier. After processing, this output is automatically e-mailed to you.

At present, FAVE-align works with English-language data only.

Using FAVE-align involves three steps:


Orthographically transcribe the recording.

This can be done in any program, as long as the transcription is structured in tiers, with one tier per speaker. We recommend using ELAN or Praat. A short introduction on how to transcribe sociolinguistic interviews using ELAN can be found here.

screen shot of a transcription using ELAN

Screen shot of ELAN transcription, using different tiers for each speaker. Click on the image for a larger version.

Each speaker should be transcribed on a separate tier in short annotation units or breath groups. Overlapping sections of speech are represented by overlapping annotation units on different tiers, preserving the structure of the original conversation as closely as possible.

The transcription should use standard English orthography. It may also contain a number of special markup symbols, such as "{LG}" for laughter, "{NS}" for background noises, or double parentheses "(( ))" to indicate uncertain transcriptions or unintelligible segments of the conversation. Detailed transcription guidelines from the PNC (Philadelphia Neighborhood Corpus) project with a list of all the markup symbols used can be found here.

The transcription may also include a style tier to mark different speaking styles. If the style codes outlined in section 3.4 of the PNC transcription guidelines are used, they will be automatically converted to the corresponding Plotnik style codes by FAVE-extract. Instructions on how to set up and code the style tier in ELAN can be found here.

Export the transcript as a tab-delimited text file.

Once the transcription of the recording is completed, it should be exported as a tab-delimited text file. Each row of the transcript should correspond to one transcribed annotation unit and contain the following five columns:

screen shot of a tab-delimited transcript file

Format of the tab-delimited transcript file. Click on the image for a larger version.

  1. speaker ID
  2. speaker name
  3. beginning of annotation unit
    (in seconds)
  4. end of annotation unit
    (in seconds)
  5. transcribed text

Note: Speaker ID can be any abbreviation used to designate the speaker. It is the second column, speaker name, that is actually used to name the tiers in the final aligned TextGrid.

In ELAN, the transcript can be exported in the desired format via File > Export As > Tab-delimited Text... Instructions on how to export transcripts in this format from ELAN can be found here.

For transcriptions done in Praat, the following Praat script can be used for conversion: Convert_To_FAVE-align_Input.praat.

Note: It is important that the input text file has this exact format. If there are more than five fields per line - for example, if you forget to uncheck the "duration" check box in ELAN - the alignment will not work.

Out-of-dictionary words check

Generate a list of out-of-dictionary ("unknown") words.

The first step in the process of forced alignment is for the aligner to automatically convert all words in the input orthographic transcription into phonemic transcriptions. This is done by looking up words and their transcriptions in (a modified version of) the CMU Pronouncing Dictionary. If a word has no entry in the dictionary, it will be ignored by the forced aligner by default.

screen shot of where to check the unknown words check option

"Check transcription for unknown words" option.

To prevent this from happening, you can generate a list of out-of-dictionary words in your transcription by checking the "Check transcription for unknown words" option in the "Options" section of the aligner web page.

scrren shot of the list of unknown and truncated words generated by FAVE-align

List of unknown and trunctated words generated by the program. Click on the image for a larger version.

This option performs a dictionary lookup for all words in the input transcription text to check whether a given word has an entry in the CMU pronouncing dictionary. All words in the input transcription text for which no entry is found in the dictionary, as well as all truncated words (words ending in "-"; see section 2.3 of the PNC transcription guidelines), are written to file and sent back to the user. (Note that no actual alignment takes place when you select this option.)

Supply a list of transcriptions.

You can then go through this list, eliminate misspellings in the original transcript, and supply the missing transcriptions for the remaining words in a separate input transcription file.

screen shot of where to check the additional input to the dictionary option

"Import dictionary transcriptions" option.

To add your transcription to the dictionary, select the "Import dictionary transcriptions" option in the options section of the alignment interface when re-uploading your files. (Don't forget to also re-export your transcript if you corrected typos or misspellings in the original transcription.)

screen shot of a sample input transcription file

Format of input transcriptions file. Click on the image for a larger version.

The uploaded file should contain two columns, one with the orthographic transcription, one with the phonemic transcription. All transcriptions must use the ARPAbet format (including stress digits for all vowels) and will be added to the dictionary only temporarily (i.e. for the alignment of your uploaded files only).

(You may also do another check run with both options selected at the same time to make sure you did not miss any of the out-of-dictionary words in your input file, or introduced new ones in the course of your corrections of the transcript file.)


screen shot of an aligned TextGrid

Aligned TextGrid. Click on the image for a larger version.

Now you're ready for alignment proper! Upload the sound file, transcription file, and input file for the dictionary (if applicable). The resulting TextGrid will be sent as an attachment to your email address after it has been processed. It will contain two tiers per speaker: a phone tier and a word tier.

Note: Please note that depending on the web site traffic, the alignment may take a while. Do not expect an immediate response - the server may be busy, and forced alignment, especially on long sound files with many overlapping speakers, is very computationally intensive.

The web site returned a .errorlog file - what's that?

If you chose not to upload a file supplying input transcriptions for missing out-of-dictionary words, these words will be ignored by the forced aligner. Depending on the length and content of the annotation unit in which these out-of-dictionary words are contained, this can cause the alignment to fail for this particular annotation unit. The aligner will continue aligning the rest of the file, but the alignment failure will be recorded in the .errorlog file for the user.

Also, the forced aligner occasionally (but very rarely) returns overlapping intervals, where the final boundary of the first of two adjacent intervals in a TextGrid tier overlaps with the beginning of the following interval by a millisecond. This can cause problems when navigating through the TextGrid in Praat. The web site therefore adds an entry in the .errorlog file to alert the user to the existence of this problem, which can be simply fixed by opening the TextGrid in Praat and adjusting the boundary in question.