Language Development Project transcripts consist of transcribed speech supplemented with various tiers of annotation. These annotations capture a range of phenomena associated with a given utterence (both within the utterance and the context in which it occurred).
Annotation tiers do not necessarily get created simulataneously with the speech transcription. A transcript may be supplemented with an additional “tier” of annotation at some later time.
For this reason, we needed to create a simple mechanism for noting what annotations are present or absent in a particular transcript. The annotation tracking mechanism we’ve devised is indeed quite simple: it consists of (1) this document, which names the various tiers of annotation currently recognized, and (2) an annotation worksheet associated with each transcript, indicating what annotation tiers are present or absent on that particular transcript (and for which speakers).
The table below indicates the annotation tiers currently recognized:
Tier | Phenomena annotated |
---|---|
Base | Basic structural/contextual info |
SR | Gesture |
GS | Gesture-speech relationship |
GX | Additional gesture-speech relations |
MOR | Morphology/part-of-speech |
SYN | Syntax |
MAX | Syntactic/semantic categories |
Each tier is described in more detail below. In particular, we note what transcript columns are included in each tier.
Note
If the annotation worksheet in a transcript file indicates that a particular tier is present, that implies that every column included in the tier’s definition is present – i.e., has been coded – in that transcript.
See the transcript column specification for details on particular columns.
The Base tier annotates the sequential/temporal order of utterances in a transcript as well as speaker/contextual information. The columns comprising this tier are speaker-independent and present on every transcript. This annotation tier is created by the original transcriber in the course of transcription.
Columns: | time, line, key, context |
---|
Note
Unlike the Base tier, all of the remaining annotation tiers are comprised of speaker-dependent columns. That means there are actually two columns indicated by each column name, one for the child speaker (indicated by a c_ prefix) and one for the parent speaker (indicated by a p_ prefix).
For example, when we say the GS tier includes the g_type column, one should note that this refers to both a column annotating parent utterances (p_g_type) and a column annotating child utterances (c_g_type).
The annotation worksheet included with each transcript file allows one to specify which speakers (parent and/or child) have a particular annotation tier. (Yes, there are occassions when a child speaker may get annotated but not a parent.)
The SR annotation tier captures basic gesture information.
Columns: | form, lrb, obj, gloss, orient, mspd |
---|
Columns: | gs_rel, g_type |
---|
Columns: | time, time_gen, word, word_num, sem_role, pres_ref, reinf, added, persp |
---|
Columns: | chat, enum, mor |
---|
Columns: | syn |
---|
Columns: | clauses, np, pp, dpp, wpu, passives, syntype |
---|