Research on the literacy and language development of Spanish-speaking English language learners

Research Core


a-boy-writingTo address the goals of the VIAS program, the Research Core worked closely with subproject researchers to select common measures that adequately capture the constructs of interest. The Research Core also provided technical assistance related to data collection and analysis. With regard to the development of assessments and protocols, the role of the Research Core was to consider whether new measures were needed to answer the research questions posed by the program and to develop and validate these measures. In deciding which measures were needed, the Research Core considered whether these new measures would benefit the research community beyond the program project. Measures that were developed include the Test of Academic Vocabulary in English, Assessment of Multi-word Knowledge, Word Associations Test, and Test of Homonym Knowledge.

Research Goal

The goal of the Research Core was to support the subprojects in selecting and developing assessments and protocols and providing technical support to the subprojects in the area of data analysis.

Description of Project Work and Findings

The primary task of Research Core entailed the development of a series of measures assessing different aspects of vocabulary knowledge including: an assessment of students’ knowledge of vocabulary that appears frequently in grade-level text (Test of Academic Vocabulary in English); an assessment of students’ knowledge of high frequency multi-word units (Test of Multi-word Knowledge); an assessment of students’ knowledge of word meanings and word associations (Word Associations Test) and an assessment of students’ knowledge of the meanings of words that are homonyms; i.e., words that share the same spelling and the same pronunciation but have different meanings (Test of Homonym Knowledge). The Research Core also developed a questionnaire for teachers to measure their knowledge of vocabulary development and instruction.

Over the course of the program project, the assessments were developed, tested in a series of cognitive labs, and revised. To establish the reliability and validity of the measures, we collected data on 1,450 English language learner (ELL) students, former ELL students, and native-English speaking students in Grades 3-8 in a large urban district in the Southwest United States. The Gates-MacGinitie Reading Test (GMRT) vocabulary subtest and the Test of Silent Word Reading Fluency (TOWSRF) were also administered to help validate the researcher-developed measures. Additionally, the teachers of the participating students completed a protocol that collected information on students’ schooling (e.g., language of literacy instruction, special needs, mobility, early childhood experiences, and levels of English language proficiency). In the following section we briefly describe the development of the measures.

back to top

Test of Academic Vocabulary (TAVE)

Current vocabulary measures assess how students compare with each other but do not generalize back to information that is meaningful for instructional purposes, notably how well students are likely to know vocabulary that appears frequently in grade-level texts and the types of words that will be challenging for them.

The TAVE assessment consists of four mini-tests, each composed of three units. Each unit contains four items and a word bank with nine words, four of which are target words and five of which are distracters. Items consist of a definition and a cloze sentence that provides context for the target word. Participants are instructed to select a word from the word bank that matches the definition and completes the cloze sentence. Each mini-test takes less than one hour to complete and the full TAVE is administered over four sessions.

In order to identify academically important vocabulary words for different grade levels, a database of academic vocabulary was developed by combining information from the Educator’s Word Frequency Guide (Zeno, Ivens, Millard, & Duvvuri, 1995) and the Living Word Vocabulary (Dale & O’Rourke, 1981) into a single database. Each word meaning was coded for attributes that influence its acquisition. We then obtained empirical estimates of item difficulty for 222 target vocabulary word meanings through two separate pilot studies. Next, we constructed a model (referred to as model 1) relating these empirical item difficulties to the item characteristics in our database. Estimated difficulties based on the model had a moderate correlation with empirical difficulties.

To construct the field test, we used this mathematical model to calculate estimated difficulty values for the remaining word meanings. Based on the estimated difficulty of each word meaning, the word meanings were grouped into difficulty quartiles at each target grade level (Grades 3-8). Next, word meanings were randomly selected from each quartile to reflect the distribution in the database with regard to part of speech and morphology. In order to vertically scale the forms used in the field test, two units from the grade level below and one unit from the grade level above were randomly chosen to be administered to the target grade level along with the grade-appropriate units. Child-friendly definitions and cloze sentences were developed for the selected words, drawing on a variety of high quality dictionaries for children and English language learners.

We used item response theory (IRT) to conduct psychometric analyses on the new vertically-scaled TAVE. Analysis of the field test data showed that the assessment appears highly reliable. As for concurrent validity, student factor scores on the TAVE correlated highly with Gates-MacGinitie word knowledge extended scales scores. Additionally, analyses suggest that the test measures a single factor.

back to top

Assessment of Multi-Word Knowledge

We developed a twenty-four item test, the Assessment of Multi-word Knowledge, to assess children’s knowledge of high utility multi-word units (MWUs).

In order to generalize back to a meaningful corpus, corpus-based methods were used to identify four-word MWUs in a corpus of Grade 5 academic written texts (from a corpus of approximately 20 million words). A total of 260 MWUs were identified, categorized by discourse function (Biber, Conrad, & Cortes, 2004; Simpson-Vlach & Ellis, 2010), and rated for linguistic importance and importance for teaching. The highest rated referential and discourse organizing MWUs that were frequent in the corpus were selected for the assessment.

An item type similar to one developed by Revier (2009) was used as a basis for creating the items for the assessment. This item type was judged to measure productive knowledge of collocations that are similar in structure to MWUs (Shillaw, 2009). The item type included a cloze sentence with a word bank. In order to develop an item, a cloze sentence in which the MWU would fit was written. All sentences were rated with the Lexile Analyzer to ensure that they were appropriate for Grade 3 students; any sentence that was rated as too complex was modified accordingly. Ongoing data analysis is exploring the meaning of the scores of the assessment and their relationship to other measures developed by the Research Core.

Word Associations Test of Academic Vocabulary in English

Though depth of word knowledge has been shown to be as important as breadth of word knowledge in reading performance and comprehension (Shen, 2008), few assessments have been developed that distinguish depth from breadth or measure depth in school-age ELLs (Schmitt, Ng, & Garras, 2011), and the current measures that exist for school-age children (e.g., Schoonen and Verhallen, 2008) do not control for type of lexical association: subordinate, superordinate, or synonym. The ten-item Word Associations Test of Academic Vocabulary in English was designed to assess children’s knowledge of the decontextualized lexical associations of high frequency English vocabulary, controlling for type of lexical association.

Drawing from Schoonen and Verhallen (2008), each item consists of a central word (the stimulus) that is surrounded by six other words. Three of the six words are correct associations (keys) and three of the words are considered incorrect associations (distracters). The keys are decontextually related to the stimulus (i.e., they have an inherent relation to the stimulus independent of context), while the distracters are context-bound and syntagmatically related to the stimulus. Each key in each item represents one of the three possible paradigmatic relations: a subordinate relationship, a superordinate relationship, and a synonymous relationship. Children are prompted to draw a line to the three words that are associated with the stimulus irrespective of context.

Analyses were conducted to assess the psychometric functioning of this non-standard item format for assessing vocabulary knowledge. We first investigated the structure of the synonym, subordinate, and superordinate items by comparing a three-factor model (i.e., one for each type of relationship, positing that knowledge of the three different relationships is distinct) versus a one-factor model (i.e., that only one type of knowledge about vocabulary was related to performance on the test). Because three words were chosen for one key word, we modeled item-level covariance to account for this dependence. We conducted analyses separately for two groups (non-ELLs and ELLs). Because of the high correlation between the factors, the three-factor model did not obtain reasonable results in either group.

However, the single-factor model fit well in both groups, suggesting it is a reasonable structure to investigate further. Using this structure, concurrent validity was investigated with respect to two other measures of word knowledge: the Gates-MacGinitie Passage Vocabulary and Test of Silent Word Reading Fluency. Findings suggest that the item format functions well as an indicator of general word knowledge.

back to top

Test of Homonym Knowledge (THK)

English language learners have difficulty with English vocabulary because many English words sound and look alike but have different meanings. This ten-item Test of Homonym Knowledge (THK) assesses children’s knowledge of the meanings of homonyms. It was designed to assess the hypothesis that students at the lowest grades tend to know word meanings common at lower grade levels, whereas students in the upper grades tend to know multiple meanings for each word. Each item includes a single word (the stimulus) and six choices of definitions for the word, consisting of three keys and three distracters. The three keys are homonymous definitions and unrelated to each other. One of the meanings for a key is a meaning students at grade 4 are likely to know, one meaning is a meaning students at grades 4 and 6 are likely to know, and one meaning is a meaning students at grades 4, 6 and 8 are likely to know. The three distracters are not definitions for the stimulus.

In order to generalize to a meaningful corpus, stimuli word forms for the THK were drawn from the Educator’s Word Frequency Guide (Zeno, et al., 1995) and have a U value between 10 and 999 at Grades 3, 5, and 7. The keys associated with each word form were chosen from the Living Word Vocabulary (LWV) database (Dale & O’Rourke, 1981). Stimuli are presented as single words without context above a randomized list of keys and distracters. Students are instructed to choose three different meanings that always go with the word at the top.

Distracters are matched to keys based on syntax and length, and were constructed such that there are an equal number of options for each part of speech (e.g., three nominal options and three verbal options). The distracters are constructed meanings and not related to actual words in any database; however, all of the words within both the keys and distracters appear in the LWV at the fourth grade level. Keeping all key and distracter options at the fourth grade level ensures that word comprehension does not impede performance on the task. Distracters were constructed to be matched to keys in terms of content: each distracter is vaguely related to one of the keys by collocation or theme. However, none of the distracters are possible meanings for the stimuli. Ongoing data analysis is exploring the meaning of the scores on the assessment and their relationships with other measures used in the Research Core.

Development of Cross-Project Questionnaires and Observational Protocols

The Research Core also focused on development of research protocols used in the subproject studies. These included a demographic survey that elicits information about children’s schooling history, parents’ background and home language and literacy practices; a teacher questionnaire that collects information on teacher’s backgrounds and qualifications and on students’ instructional history; two protocols that assess teachers’ language and literacy proficiency in English and Spanish; and an instrument used for classroom observations.

The Research Core team also created a new survey to collect information about teachers’ knowledge of vocabulary development and instruction. The Teacher Knowledge of Vocabulary Survey (TKVS) adds to previous research on teacher knowledge about reading, including the Teacher Knowledge about Reading and Reading Practices (TKRRP) assessment designed by Joanne Carlisle and colleagues (Carlisle, Correnti, Phelps, & Zeng, 2009). The TKVS measures teacher knowledge of vocabulary development and instruction, including the vocabulary development and instruction of English learners. The initial pool of TKVS items was created from statements about vocabulary development and knowledge found in Graves (2006) and National Reading Panel Report (NICHD, 2000). A total of 266 statements were drafted and then rated for inclusion by two subject matter experts, based on the value of a statement in expressing knowledge worth testing, its quality as a test item, and the degree to which the statement was clearly true. Sixty-one statements were retained and used in a cognitive lab. Items that were unclear to participating teachers were revised or deleted. The final operational 52 items for the pilot included 28 true items and 24 false items, grouped into the following six categories: vocabulary development; vocabulary instruction-providing rich and varied language experiences; vocabulary instruction-teaching individual words; vocabulary instruction-teaching word-learning strategies; vocabulary instruction-fostering word consciousness; and vocabulary instruction for English learners.

The TKVS was piloted with 50 teachers who participated in the subproject 3 studies. Participants took the survey online and indicated whether they believed each statement was “True”, “False”, or “I don’t know.” Independent of the teachers’ performance, subject experts rated each item’s difficulty in terms of the probability of borderline teachers having the knowledge to judge each statement correctly. The expert’s ratings were used to create cut scores to identify three interpretative levels of teacher knowledge of vocabulary development and instruction based on performance on the survey: emergent, intermediate, and expert. The majority (34) of the teachers were found to be in the intermediate category. Teacher performance was found to correlate with the experts’ predicted difficulties. Rasch analyses also indicated that the statements worked together well as a measure of teacher knowledge.

These initial results provide evidence for the validity of the TKVS as a measure of teacher knowledge of vocabulary development and instruction and as a tool to identify gaps in the teachers’ knowledge. Based on their performances on the TKVS, teachers in the pilot appeared to need more instruction on general vocabulary development and the instruction of vocabulary for English learners. A manuscript in preparation describes the development and initial validation of the survey instrument as well as implications for the use of the TKVS, including as a needs assessment or guide for professional development for both preservice and inservice teachers.

Selected Dissemination Activities

Findings from the pilot were presented at the International Association for the Study of Child Languages 2011 conference. Dr. August and Ms. Artzi joined Dr. David Francis (University of Houston) in presenting findings related to the assessments developed by the Research Core at the annual conference for the National Center for Research on Education and Assessment in Teaching English learners (CREATE center) in October 2012. Additionally, at the 2012 CREATE conference Dr. August and Ms. Artzi conducted a workshop for practitioners drawing on the different measures developed in research core. Dr. Wright presented findings regarding the MWU project at the Georgetown University Roundtable in March, 2012.


Biber, D., Conrad, S., & Cortes, V. (2004). If you look at...: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371–405.

Carlisle, J. F., Correnti, R., Phelps, G., & Zeng, J. (2009). Exploration of the contribution of teachers’ knowledge about reading to their students’ improvement in reading. Reading and Writing, 22, 457-486.

Dale, E., & O’Rourke, J. (1981). The living word vocabulary. Chicago: World Book/Childcraft International.

Graves, M. F. (2006). The vocabulary book: Learning and instruction. New York: Teachers College Press.

National Institute of Child Health and Human Development. (2000). Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH Publication No. 00-4769). Washington, DC: U.S. Government Printing Office.

Revier, R. L. (2009). Evaluating a new test of whole English collocations. In A. Barfield & H. Gyllstad (Eds.), Researching collocations in another language. New York: Palgrave.

Schmitt, N., Ng, J. W. C., & Garras, J. (2011). The word associates format: Validation evidence. Language Testing, 28(1), 105-126.

Schoonen, R., & Verhallen, M. (2008). The assessment of deep word knowledge in young first and second language learners. Language Testing, 25(2), 211–236.

Shillaw, J. (2009). Commentary on Part III: Developing and validating tests of L2 collocation knowledge. In A. Barfield & H. Gyllstad (Eds.), Researching collocations in another language. New York: Palgrave.

Shen, Z. (2008). The roles of depth and breadth of vocabulary knowledge in EFL reading performance. Asian Social Science, 4(12), 135-137.

Simpson-Vlach, R., & Ellis, N.C. (2010). An academic formulas list: New methods in phraseology research. Applied Linguistics, 31(4), 487-512.

Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995). The educator’s word frequency guide. Brewster, NY: Touchstone Applied Science Associates.

Research Core: Table of Instruments Used in the Study Download the PDF.

back to top


Principal Investigators

Diane August
Dorry Kenyon

Center for Applied Linguistics

Research Team

Chris Barr
University of Houston

Lauren Artzi
Annie Duguay
Erin Haynes
Lindsey Massoud
Laura Wright

Center for Applied Linguistics