Navigation
index
TOC
|
search
|
home
»
This Page
Show Source
Search
Annotation Guide
ΒΆ
Intro
Overview
What’s Covered?
See Also
Setup
Xcode
Installing Xcode Tools on Mac OS X
Subversion
Creating directory for repos
Checking out the morphosyntax repo
Checking out the data repo
Checking out the code repo
Final steps for setting up repos
CLAN
Installing the GUI
Installing the command line tools
Edit your profile
Edit
vimrc
Repository
bin
lib
clan
clan/lib
clan/bin
chat
chat/incoming
chat/coders
chat/final
chat/inserts
Programs
all
add
caps
commitlex
compare.py
Usage
Options
dic
Usage
dis
fixlines
grasper
postal
root_nv
synflagger
syntax_extract
Helper programs
dict
fixpost.pl
The Annotation Process
1. Navigate to your working directory
2. Use dump.py to generate the CHAT file
3. Clean up the file for morphosyntactic analysis
4. Reconcile words not found in the CLAN lexicon
4.1. Spelling errors
4.2. Compound words
4.3. Numbers
4.4. Applying the
&
and
@
symbols to non-words or idiomatic words
4.5. Adding words to the CLAN lexicon
5. Run MOR to generate a
.mor.cex
file
6. Run the automatic part-of-speech disambiguator POST
7. Manually disambiguate the remaining ambiguous words in the
.pst.cex
file
8. Generate the syntax tier with GRASP
9. Run
fixlines
on the newly created syntax file
10. Flag potentially incorrectly coded utterances with graspParse.py
11. Search for flags and correct any problems
12. Commit the corrected syntax-coded transcript back into the SVN repository
Prepping a transcript
1. Remove disfluencies and repetitions
1.1. Remove utterances consisting only of
#
2. Find and remove repeated words and phrases
2.1. Use the
tn
shortcut in VI to find single-word repetitions
2.2. Use the program
dis
to find repeated words and phrases
3. Add an
&
symbol to the beginning of non-words
4. Ensure that “Mom”, “Dad”, etc. are properly capitalized
4.1. Find incorrectly lower-case family terms using the
`m
shortcut
4.2. Find incorrectly capitalized family terms using the
`d
shortcut
5. Search for and correct common mistakes using the
`v
shortcut
5.1. Phrases that need or might need to be joined by a
+
5.2. Phrases that need or might need to be changed or written with an apostrophe
5.3. Two capitalized words in a row
5.4. Punctuation placed in the middle of an utterance
5.5. Repetitions marked by
[xN]
5.6. Single quotes (apostrophes) used as quotation marks
5.7. Apostrophes in proper nouns
5.8. Misuse of the
@l
marker
5.9. A space at the beginning of the tier
5.10. Utterances missing punctuation
6. Run
caps
to verify that capitalized words are proper nouns and are correctly formatted
Common problems
Abbreviations
Part of speech
The phrase
go to sleep
Syntax
V NP
to
V
XCOMP
XMOD
XJCT
V NP V
AUX
and
INF
as head of a verb cluster
DET
as the head of a noun phrase
det
used as
JCT
Nominals in apposition
Pronouns used as determiners
Missing copulae
Missing copula vs. post-positioned
MOD
Post-positioned
MOD
s after
-thing
,
-body
, and
-one
words
SUBJ
after
ROOT
Post-positioned quantifiers
Perfect participle as
PRED
vs. passive construction
As
PRED
As passive construction
Compound (multi-word) prepositions
Prepositional phrase as object of another preposition
Dangling
COORD
s
When to use
PRED
Using the
TAG
GR
Negated auxiliary/copula tags
Sentence fragment with full verb
Full sentence with no grammatical connection
The phrase
to be
+
done/finished
+
VERB
ing
Quantifiers and
quantifier + of
phrases
When to use
VOC
Structure of a noun phrase
COMP
s and
CPRED
s introduce by WH-words
all
NP VERB
is
The phrase
be
ADJ
to
VERB
The phrase
good at
VERBing
Clauses used as objects of prepositions
WH-word
do you think
CLAUSE
VI Shortcuts
General use shortcuts
Changing GR information
Changing part-of-speech information
Navigation
index
TOC
|
search
|
home
»