While researchers who use our data end up publishing on a wide range of topics, many of them will require some information on the morphology and/or syntax of parent and child speech. To get this data, we use the CLAN software suite developed at Carnegie Mellon University. Your job as a syntax coder is to run this software that will automatically code morphosyntactic information and make corrections where the automatic parser fails. At the end of this process, you will create a file that will be inserted back into the original transcript, which the gesture coders will use for their work.
The Child Language Data Exchange System, typically referred to by the acronym CHILDES, is a central repository for first language acquisition data. In addition to the contributed corpora, the CHILDES project has specified a format for transcription called CHAT and a suite of tools called CLAN for analyzing CHAT-formatted transcriptions.
Our workflow for morphosyntactic analysis relies heavily on the CLAN tools.
The following documents from the CHILDES web site at Carnegie Mellon are useful to have on hand for reference.
The CHAT guide provides a complete specification of the CHAT transcription format. See especially Sections 6 (Words) and 14 (Morphosyntactic Coding).
The CLAN guide provides a complete description of all of CLAN tools.
This paper provides a nice overview of the process of morphosyntactic analysis with the CLAN tools. See esp. Sections 6 (Analysis based on automatic morphosyntactic coding), 11 (Difficult Decisions), and 14 (GRASP).
Finally, CHILDES GR Annotation describes the grammatical relations we code.