ESL Resources
State Capacity Building

Subscribe to Our Newsletter

Do you have a question?

Issues in Accountability and Assessment for Adult ESL Instruction

Carol Van Duzer
National Center for ESL Literacy Education
February, 2002

Throughout the 1990s, legislation increasingly required programs receiving federal funding to be more accountable for what they do. For adult education, these requirements have intensified the debate among practitioners, researchers, and policy makers as to what constitutes success and how to measure it. At the same time, the number of English language learners enrolled in adult education programs has been growing, particularly in areas of the country that have not previously seen many immigrants (Pugsley, 2001). New programs are being established to meet the demand for English as a second language (ESL) instruction, and existing programs are expanding.

This Q&A describes the legislative background of current accountability requirements for ESL programs, the issues involved in testing level gain, and critical questions whose answers can lead the field forward.

What does legislation require?

The Adult Education and Family Literacy Act (Title II of the Workforce Investment Act [WIA] of 1998) requires each state to negotiate target levels of performance with the U.S. Department of Education (ED) for three core indicators:
  1. demonstrated improvements in skill levels in reading, writing, and speaking the English language, numeracy, problem solving, English language acquisition, and other literacy skills;
  2. placement in, retention in, or completion of postsecondary education, training, unsubsidized employment, or career advancement; and
  3. receipt of a secondary school diploma or its recognized equivalent.

ED established the National Reporting System for Adult Education (NRS) to define how states are required to report their data. NRS identifies 12 functioning level descriptors, 6 for Adult Basic Education and 6 for English as a Second Language. The ESL level descriptors describe what a learner knows and can do in three areas: (a) speaking and listening, (b) reading and writing, and (c) functional and workplace skills (U.S. Department of Education, 1999-2001). These level descriptors define English language proficiency across six levels, from ESL Literacy to High Advanced.

Title II of the WIA also lists 12 criteria for states to consider when funding adult education and literacy activities. Among these criteria are establishing performance measures for learner outcomes, determining past effectiveness in meeting or exceeding these performance measures, and maintaining a high-quality information management system for reporting learner outcomes and monitoring program performance against the established measures. For measuring level gain, the NRS implementation document states that a standardized assessment procedure (e.g., a test or a performance assessment) is to be used.

How are states meeting these requirements?

To meet these criteria, each state has set its own performance standards in consultation with ED, indicating the percentage of learners that should progress from level to level in funded programs or across the state as a whole. A state can set different standards for different service providers or for different levels of proficiency. For example, the percentage of learners expected to move from ESL Literacy to Beginning ESL could be lower than the percentage expected to move from Beginning ESL to Low Intermediate. This recognizes that a learner who enters a program with no literacy skills may require a great deal of instruction before showing level gain. Each state is evaluated by ED according to the state's own performance standards. A few states (e.g., California) have instituted performance-based contracts by which programs receive money only for the learners who make certain gains.

States have also designated specific assessment tools or processes that programs may use to show level gain. These tools and processes vary among the states. Most states have chosen a standardized test (e.g., California: Comprehensive Adult Student Assessment System [CASAS], Texas: Basic English Skills Test [BEST], and New York: New York State Placement Test [NYSPlace]); several give choices among a list of approved tests (e.g., Arkansas: BEST or CASAS); and a few allow a standardized test for initial-level determination and then a competency checklist or uniform portfolio for exit-level determination (e.g., Florida and Ohio). For contact information for the BEST, CASAS, and NYSPlace, see the Adult ESL Tests section of this document.

What are the issues in testing level gain?

NRS level descriptors

Programs are required to report the percentage of learners that move from level to level during the funding year. However, there is no research to support how long it takes to advance one NRS level. Because it takes several years to learn a language well (Thomas & Collier, 1997), such information is crucial in high-stakes assessment. The time it takes to show level gain on a proficiency scale is dependent on both program and learner factors. Program factors include intensity of the classes (how long and how many times per week); training and experience of the instructors; adequacy of facilities (e.g., comfortable, adequate lighting); and resources available to both instructors and learners. Learner factors include educational background, degree of literacy in native language, age, experience with trauma, and opportunities to use the language outside of instructional time. Stakeholders need to know under what conditions (with which combinations of learner and program factors) NRS level gains are achievable.

Standardized testing

One way to test language development is through the use of standardized tests, which are developed according to explicit specifications. Test items are chosen for their ability to discriminate among levels, and administration procedures are consistent and uniform. Pencil-and-paper standardized tests are often used because they are easy to administer to groups, require minimal training for the test administrator, and have documentation of reliability (consistency of results over time) and validity (measuring what the test says it measures) (Holt & Van Duzer, 2000).

Despite the advantages, standardized tests have limitations. Their results will have meaning to learners and teachers only if the test content is related to the goals and content of the instruction (Van Duzer & Berdan, 1999). Adult education programs are often tailored to take advantage of the few hours (typically 4-8 hours per week) that adult learners are available to study. Instruction may focus on a limited number of learner goals (e.g., finding a better job or helping children with their homework). If the items in a standardized test reflect the actual curriculum, then the test may accurately assess achievement of the learners. However, if the items do not reflect what is covered in the classroom, the test may not adequately assess what learners know and can do. Given the focus on real-life, practical content in adult ESL instruction, using a test that assesses everyday vocabulary and tasks (e.g., BEST or CASAS) can yield satisfactory results.

There is concern, however, that standardized tests may not be able to capture the incremental changes in learning that occur over short periods of instructional time. Test-administration manuals usually recommend the minimum number of hours of instruction that should occur between pre- and post-testing, yet the learning that takes place within that time frame is dependent on the program and learner factors discussed previously. In the effort to make sure that learners are tested and counted before they leave, program staff may be post-testing before adequate instruction has been given. In such cases, learners may not show enough progress to advance a level unless they pre-tested near the high end of the score ranges for a particular NRS level.

Performance Assessment

Performance assessments require learners to use prior knowledge and recent learning to accomplish tasks that demonstrate what they know and can do. There is a direct link between instruction and assessment. Examples of performance assessment tasks include oral or written reports (e.g., on how to become a citizen); projects (e.g., researching, producing, and distributing a booklet on recreational opportunities available in the community); and exhibitions or demonstrations (e.g., a poster depicting the steps to becoming a U.S. citizen). A variety of performance assessments provide a more complete picture of a learner's abilities than can be gathered from performance on a pencil-and-paper standardized test.

For adult ESL, performance assessment reflects current thought about second language acquisition: Learners acquire language as they use it in social interactions to accomplish purposeful tasks (e.g., finding information or applying for a job). The performance may be assessed simply by documenting the successful completion of the task or by the use of rubrics designed to assess various dimensions of carrying out the task (e.g., rating oral presentation skills on a scale of 1-5). Both instructors and learners can be involved in the development of evaluation guidelines and in the evaluation procedure itself (Van Duzer & Berdan, 1999).

Although performance assessments provide valuable information to learners, instructors, and other program staff, their use for accountability purposes is currently limited. These types of assessment are time consuming to administer and score. To produce the reliable, quantifiable data required for high stakes assessment, performance assessments would need to be standardized. That is, for each of the NRS functioning levels, tasks would need to be developed (and agreed upon) that would represent level completion; scoring rubrics and guidelines for evaluating performance would need to be in place; and administrators and evaluators would need to be trained.

What attempts at standardizing performance assessment are being undertaken?

A few projects are attempting to develop performance assessments that would be acceptable for the NRS.
  • Ohio is developing a uniform portfolio system of performance assessment that is being validated by Ohio State University (Gillette, 2001).
  • Colorado developed a certificate system based on performance assessments that was discarded in favor of a standardized test for NRS reporting. However, the Colorado Department of Education is working with CASAS to standardize and validate one level of the Colorado Certificate of Accomplishment so that it meets the rigors of high-stakes assessment (K.
  • The National Institute for Literacy's (NIFL) Equipped for the Future (EFF) project staff is working with programs in several states to develop a continuum of performance for the EFF adult literacy content standards so that performance assessment tasks can be constructed (Stein, 2001).
  • ED's office of Vocational and Adult Education (OVAE) is supporting two performance assessment projects: (a) The Test of Emerging Literacy (TEL) is being developed by American Institutes for Research (AIR) with additional support from Arizona, Massachusetts, and Washington, and (b) the BEST Oral Interview is being revised by the Center for Applied Linguistics (CAL) in order to assess the full range of NRS functioning levels. (Two versions of the Interview, one print and one computer adaptive, will be available Fall 2002.)
  • OVAE and NIFL are supporting the National Academy of Sciences' review of standards for alternative performance assessment (National Academies Board on Testing and Assessment, 2001).

For the time being, however, performance assessments remain difficult and costly to produce for high-stakes reporting (Wrigley, 2001).

What are the critical questions to be answered?

The issues discussed in this Q&A point to several critical questions that need to be examined to move the field of adult education forward in solving the complexities of defining learner progress and how to measure it.

1. What should be counted as success, and how should it be measured? What learners, instructors, and program staff count as success may differ from what is measured by state-mandated assessment procedures. Level gain is just one possible outcome of instruction. Equally important to learner success may be an increase in literacy practices (e.g., reading a greater variety of print materials, reading to children); achievement of a personal goal (e.g., passing the citizenship test, receiving a job promotion); or an increase in confidence and self-esteem. States are currently able to count these outcomes in their own evaluation plans if they so choose. However, a change in the legislation would be required for the outcomes to be allowable under the provisions of the WIA.

Stakeholders should work together to identify what combination of assessments (e.g., standardized, performance, logs of increased practices, goal attainment, observations of increased confidence) will yield useful information for designing, modifying, and improving programs. If accountability continues to rest mainly on the results of standardized testing, then there is a need for additional language-based instruments that measure more than one skill (i.e., listening, speaking, reading, writing, grammar). Information about legislative requirements, learner goals and needs, and assessment specifications (e.g., what is purported to be measured, reliability study results) should be clear to each stakeholder. If a legislative change is warranted, then stakeholders should work with ED and legislators to have it enacted.

2. How well does the NRS scale facilitate the reporting of learner progress? The NRS looks at functional level gain as one of three core indicators by which programs can measure their success. However, no data are available that identify how long it takes to make a level gain and under what conditions (program type, intensity and length of instruction, resources and support services available). Adult education services are provided by a wide variety of institutions (e.g., local education agencies, community colleges, libraries, community-based and volunteer organizations, businesses, and unions) under varying conditions. The complex lives of the learners can leave them with little time for educational pursuits. The interrelationship among the time and conditions it takes to make a level gain, the assessment procedure chosen to measure that gain, and the resources available to assess it need to be examined.

3. What is the cost in time, staffing, and funds to effectively assess and document learning outcomes? Adult education programs generally have limited operating funds. The implementation of standardized assessments, whether in a small program or a large one, requires extra staffing time, often beyond the limits of the funding received. In programs with large numbers of learners with low literacy skills, it is a tremendous challenge just to ensure that test forms are properly filled out (e.g., name and identification number) and answers are marked in appropriate places. Additional costs may be incurred as programs train staff or hire additional staff to develop, administer, or score assessments in a way that assures reliable and timely results.

4. What changes in program design and staff development are needed to ensure that assessment tools are reliably used? Even though standardized tests and some performance assessments have guidelines for administering and scoring, test administrators may not be following them. As mentioned above, some programs and states are post-testing too soon after pre-testing because they are concerned that learners may leave the program before they are post-tested. However, learners may not show progress if they have not had adequate instruction time between test administrations. To ensure consistent and reliable assessment, administration procedures need to be carefully followed and adequate resources need to be allocated for training.

5. How do local, state, and national policies affect assessment tools and practices and what policies need to be created? At the national level, the WIA and the NRS have set criteria that states must meet in order to receive federal funding. States have leeway, however, to set their own performance measures and select their own assessment procedures. Not all program staff may be aware of these policies. Their attitudes towards being required to use certain assessments may affect the results. What impact does such a policy have on programs? How does it differ from what is happening in other states where, to receive funding, programs are required only to achieve or exceed a certain percentage of learners making level gain? Are there differences in results among states requiring certain assessment tools versus those states that allow programs to choose?


The United States has made progress over the past decade in creating a cohesive system of adult education through legislation such as the Workforce Investment Act and frameworks such as the National Reporting System for Adult Education. Finding answers to the questions presented here will contribute to the evolving system. At the same time, the political environment that presses for accountability creates tension with the enormous amount of time it takes to build such a system. As program staff in both new and established programs struggle with accountability issues, they need to advocate for sound assessment policies at the local, state, and national levels-and the resources to implement them.


Gillette, G. W. (2001, November 1). Alternative assessments. Message posted to National Literacy Advocacy discussion list, archived at

Holt, D., & Van Duzer, C. (2000). Assessing success in family literacy and adult ESL (Rev. ed.). McHenry, IL & Washington, DC: Delta Systems & Center for Applied Linguistics.

National Academies Board on Testing and Assessment (BOTA). (2001, December). Performance assessments for adult education: Exploring measurement issues. Symposium conducted at the meeting of BOTA, Washington, DC.

Pugsley, R. (2001, February 15). The learner population in adult ESL programs. Presentation at Symposium on Adult ESL Practice in the New Millennium. Washington, DC: National Center for ESL Literacy Education. (Full symposium proceedings available at

Stein, S. (2001, November 9). Can research improve policy or practice? Message posted to National Literacy Advocacy discussion list, archived at

Thomas, W. P., & Collier, V. (1997). School effectiveness for language minority students. Washington, DC: George Mason University. (Available at

U.S. Department of Education. (1999-2001). NRS online. Washington, DC: Author. (Available at

Van Duzer, C., & Berdan, R. (1999). Perspectives on assessment in adult ESOL instruction. In J. Comings, B. Garner, & C. Smith (Eds.), The annual review of adult learning and literacy (pp. 200-242). San Francisco: Jossey-Bass.

Workforce Investment Act of 1998, Pub. L. No. 105-220, 212.b.2.A, 112 Stat. 936 (1998).

Wrigley, H. S. (2001, Winter). Assessment and accountability: A modest proposal. Field Notes, 10(3), 1, 4-7.

Adult ESL Tests

BEST (Basic English Skills Test)
Center for Applied Linguistics
4646 40th Street NW, Washington, DC 20016-1859

CASAS (Comprehensive Adult Student Assessment System)
8910 Clairemont Mesa Boulevard, San Diego, CA 92123

NYSPlace (New York State Placement Test for Adult ESL Students)
City School District of Albany, Albany Educational TV
27 Western Avenue, Albany, NY 12203
518-462-7292 x30

This document was produced at the Center for Applied Linguistics (4646 40th Street, NW, Washington, DC 20016 202-362-0700) with funding from the U.S. Department of Education (ED), Office of Vocational and Adult Education (OVAE), under Contract No. ED-99-CO-0008. The opinions expressed in this report do not necessarily reflect the positions or policies of ED. This document is in the public domain and may be reproduced without permission.