Lexical richness (or vocabulary diversity) has been measured in many ways, but all of these suffer from two major weaknesses. First, most words are very low in their probability of occurrence, such that estimates of word frequency and lexicon size depend systematically on sample size. Second, speakers do not sample words at random when speaking, but rather according to discourse context, topic, syntactic, and other constraints–that is, frequency is an inaccurate indicator of most words’ likelihood of occurrence. We are seeking improved measures of lexical richness through a combination of statistical modeling techniques that deal with the first problem, and simulations based on the speech transcripts that help ameliorate the second problem. We are exploring the consequences of this approach (and the problems that led us to develop this approach) for the interpretation and use of lexical richness measures in research. Of greater interest, we will be describing the trajectory of lexical richness across caregiver and child speech and over the time course of the spontaneous speech samples (ultimately, 14-58 months).