Considering Home Language as a Grouping Variable for DIF Analysis in an English Language Proficiency Assessment

-

Please consult the event program for date, time and location.

Presented at: AAAL 2014

Among the challenges facing developers of large-scale assessments is that of ensuring that no test item unfairly advantages or disadvantages a particular group of test takers because of characteristics that are unrelated to the construct being evaluated. To ensure that all items are unbiased, test developers conduct differential item functioning (DIF) analysis, which compares groups to determine whether people with the same underlying ability have a different probability of giving a certain response “because of some characteristic of the test item and/or testing situation”(Zumbo, 2007, p.229). For English Language Proficiency (ELP) examinations, test takers are conventionally evaluated for DIF using ethnicity as a proxy for home language (i.e., comparing Hispanics and non-Hispanics). However, research has suggested that this approach may be too simplistic, and both language testers and researchers have become concerned with finding more appropriate ways of confirming test validity and fairness (APA, AERA, & NCME, 1999; Kim, 2001).

This study considers the use of home language as a grouping variable in the DIF analysis on the ACCESS for ELLs® ELP assessment (Center for Applied Linguistics, 2012). While a number of studies (e.g., Chen & Henning, 1985; Sasaki, 1991; Ryan & Bachman, 1992; Kim, 2001; Li & Suen, 2012) have found that DIF exists when comparing the performances of speakers of Indo-European and non-Indo-European languages on tests of ELP, this project considers a slightly different approach to using home language to assess test validity. Based on the close linguistic relationship that English shares with Germanic and Romance languages (Finkenstaedt & Wolff, 1973; Williams, 1986), speakers of languages from these families are compared with students whose home languages are more distant from English. Results of this analysis have applications for test development by informing qualitative studies to determine which test items merit close attention to assure fairness across groups.