Joyrex Labs

Outline

This Page

Programs

The following list outlines the various scripts and programs developed by members of the LDP.

all

A simple script that updates all svn repositories that are being used. This was a lot more useful before the repositories were consolidated, but this can still be run from anywhere.

add

A simple script that opens the specified .cut file from the $CLAN/lib/english/lex in a vi editor.

caps

Identifies all capitalized words and all ampersanded words in a CHAT file. Also identifies those capitalized words that also appear in lower case and those ampersanded words that also appear without an ampersand.

Usage:

caps xx.xx.cha

Sample output:

Capitalized words:

Alls
Bam             -->     check lower case
Caribou
Dad
Daddy
Flipper
Fuh
Ick             -->     check lower case
Kuh             -->     check lower case
La
Lilia
Mama            -->     check lower case
Mexican
Mom
Mommy
Museum_Of_Science_And_Industry
Pete
Seesee
Uhuh
Vroom           -->     check lower case
Vuh             -->     check lower case
Wah             -->     check lower case
Zs

Ampersanded words:

&fa             -->     check for unampersanded instances
&fuh
&la

commitlex

A simple script that changes directory to the $CLAN/lib/english/lex directory, commits any changes to the svn repository, and returns to the previous working directory.

compare.py

Compares the tags of specified tiers in chat files.

The script compares a test tier against a gold standard tier. The tiers to be compared can be in the same file or separate files.

Usage

To compare two syntax tiers in the same file (‘test.cha’):

compare.py --test TIER --gold TIER test.cha

If comparing seperate files (‘test.cha’ against ‘gold.cha’):

compare.py --test TIER --gold TIER test.cha gold.cha

Note that both –test and –gold default to ‘syn’:

compare.py test.cha gold.cha

Options

-h, --help                show this help message and exit
-g <TIER>, --gold=<TIER>  tier to use as gold standard
-t <TIER>, --test=<TIER>  tier to test against gold standard
--mismatch=<TEST> <GOLD>  print lines with specified tag mismatches
--mismatches              print all mismatches
--matrix                  print confusion matrix
--labeled                 evaluate syntax tiers with dependency labels
--unlabeled               evaluate syntax tiers without dependency labels
--omit_single_words       exclude single word utts from evaluation

dic

dic is a bash script for looking up words in the CLAN lexicon. It will search for the term with word boundaries on either side. Thus searching for “how” will find “how”, “how_come”, or “know+how”, but not “howl”, “howitzer”, or “shower”.

The default output is in the same format as it would be in a transcript , i.e. pos|word(+comp|word...). Adding the -f switch shows output in the format of grep, i.e. filename:word {[scat pos]([comp comp+comp])}.

You can add the -p switch to search for parts of words. Thus dic -p how will find “how”, “how_come”, “know+how” AND “howl”, “howitzer”, and “shower”.

Usage

dic how:Searches lexicon for “how” with word boundaries. Output is same as in CHAT transcripts.
dic -f how:Searches lexicon for “how” with word boundaries. Output is same as if grep had been used.
dic -p how:Searches lexicon for any word containing the pattern “how”. Output is same as in CHAT transcripts.
dic -p -f how:Alternatively, dic -f -p how. Searches lexicon for any word containing the pattern “how”. Output is same as if grep had been used.

dis

fixlines

grasper

postal

root_nv

synflagger

syntax_extract

Helper programs

dict

fixpost.pl