Key Assessment Terms
AccountabilityResponsibility for educational outcomes;  these outcomes are often measured through standardized testing.
Achievement testA test that measures how well a student has reached the objectives of a specific course or program
ACTFL proficiency levelsGuidelines developed by the American Council on the Teaching of Foreign Languages (ACTFL) that describe language performance
Alternative assessmentNon-traditional forms of assessment;  may include portfolios, observations, work samples, or group projects
Analytic scoringMethod of scoring or rating that assigns separate scores for different aspects of a student’s performance
Aptitude testTest which measures a student’s talent for learning language;  predicts future performance
AssessmentAn ongoing process of setting clear goals for student learning and measuring progress towards these goals
Assessment literacyKnowledge about and a thorough understanding of myriad assessment practices, especially by educators
AuthenticityHow well a test reflects real-life situations
Cloze testTest that measures comprehension by asking students to fill in missing words from a passage
Computer-adaptive testComputer-based test that adapts to the test-taker’s performance and presents easier or more difficult tasks based on previous answers
ConstructWhat a test measures
Construct validityHow well a test measures what it is supposed to measure
Content validityHow well the content of a test reflects the construct that the test is measuring
Criterion-referencedScores interpreted with respect to standards or a theory of language;  everyone can get a high score.
Cutoff scoreOn a criterion-referenced test, the minimum score a student must receive to demonstrate a determined level
Direct testingTesting method that closely matches the construct being measured
Discrete testTest focused on specific language skills
Diagnostic testTest that identifies a student’s strengths and weaknesses
EvaluationMaking decisions based on the results of assessment
Face validityNon-technical term that refers to how fair, reasonable, and authentic people perceive a test to be
Formative assessmentAn assessment used during the course of instruction to provide feedback to the teacher and learner about the learner’s progress toward desired educational outcomes;  the results of formative assessments are often used in planning subsequent instruction.
High-stakes testAssessment that is used to make critical decisions with consequences for one or more stakeholders in the assessment process;  an admissions test that determines the course of a student’s academic future and a test used for accountability and linked to funding are both examples of high-stakes tests.
Holistic scoringMethod of rating an assessment based on general descriptions of performance at specified levels;  while a holistic scoring rubric may take into account performance along several dimensions (e.g., fluency, grammatical accuracy, and word choice for oral language), one overall score which best represents the examinee’s performance is assigned.
ImpactThe positive or negative effects of testing
Indirect testingA method of testing that measures abilities related to the construct being tested, rather than the construct itself
InputThe materials (presented aurally and visually) that an examinee receives as part of the test tasks
Integrative testTest that addresses multiple language skills, sometimes in the same task
Multiple choice testTest in which examinees demonstrate knowledge, skill, or ability by selecting a response from a list of possible answers
Needs assessmentInquiry into the current state of knowledge, resources, or practice with the intent of taking action, making a decision, or providing a service with the results
Norm-referencedScores interpreted with respect to other examinees; some must score high, some low.
Off-the-shelfCommercially-available test which can be purchased by an educational institution or individual user and administered at the discretion of the individual user
Parallel formsTwo or more tests with different questions that measure the same underlying skill and whose difficulty levels have been determined to be equivalent;  scores from parallel versions of a test can be compared with one another.
PercentileRange of measures from 1-99 used to compare examinees with one another;  an examinee who scored in the 80th percentile placed higher than 80% of test takers.
Performance assessmentAssessment which requires the examinee to demonstrate knowledge or skill through activities that are often direct, active, and hands-on, such as giving a speech, performing a skit, or producing an artistic product
Placement testTest whose results are used to assign students to classes designed for learners at a particular level
PracticalityFeasibility of test given materials, funding, time, expertise, and staff
Proficiency testTest of ability in a defined area of language;  the area may be narrowly-defined (e.g., English for airline pilots) or more broad (e.g., social and academic language).  Proficiency tests are not tied to a specific curriculum or course and are often contrasted with achievement tests.
Program evaluationProcess of collecting data from multiple sources about an instructional program or intervention and making a decision about the success of the program based on this information;  the evaluation could target both the process and outcomes of the program.
Raw scoreStudent’s total number of correct responses on a test
ReliabilityConsistency of scores/results
Scale scoreScore that allows test results to be compared across students;  in standardized testing, raw scores are often converted to scale scores.
Scoring methodDescribes how scoring is accomplished (e.g., machine-scored, hand-scored, centrally scored, locally scored)
Scoring processDescribes the procedures used to obtain a test score, e.g., counting the number correct, scoring holistically or analytically according to established guidelines, a scale, or a rubric
Self-assessmentPersonal rating of language ability according to specified criteria
Skills testTest focusing on a specific domain of language use, e.g., listening, reading, writing or speaking (interactive or presentational)
StakeholdersPersons involved with or invested in the testing process, e.g. test takers, administrators, parents, and teachers/instructors
Standardized testTest with fixed content, equivalent parallel forms, standard administration and scoring, field-tested, valid, and reliable
SubscoreScore that represents student performance in a particular domain or part of a test
Summative assessmentOutcome-based use of assessments, often for decisions such as grading, program evaluation, tracking, or accountability
Test accommodation“Any change to a test or testing situation that addresses a unique need of the student but does not alter the construct being measured” (Center for Equity and Excellence in Education, 2006)
Test administrationDelivery of the test items/directions to the test-takers
Test development

Process of creating a test;  steps of test development (Hughes, 2003):

1. State the goals of the test.

2. Write test specifications.

3. Write and revise items.

4. Try items with native speakers and accept/reject items.

5. Pilot with non-native speakers with similar backgrounds as the intended test-takers.

6. Analyze the trials and make necessary revisions.

7. Calibrate scales.

8. Validate.

9. Write test administrator handbook, test materials.

10. Train staff as appropriate.

Test formatMode and organization of test, test structure (e.g., multiple choice, short answer)
Test itemsTasks, questions, or prompts to which test-takers respond
Test materialsItems used for the test administration/taking
Test purposeWhat you want to learn from the test results
TestingValid and reliable practice of language measurement for context-specific purposes
ValidityValidity is a judgment about whether a test is appropriate for a specific group and purpose and includes considerations such as whether the test really measures what you think it is measuring, whether the results are similar to examinees’ performance on other tests or in class or real-world activities, and whether the use of test results have the intended effects.
WashbackEffects of test on teachers’ and students’ actions;  washback can be positive (expected) or negative (unexpected, harmful).
Online & Print Resources

Online Resources 


Print Resources 

Book with in-depth information on measurement, language test uses and methods, reliability, and validity

  • Bachman, L. & Palmer, A. (2010). Language Assessment in Practice. Oxford: Oxford University Press.

A practical guide to developing your own classroom assessments

  • Brown, H. D., & Abeywickrama, P. (2010). Language assessment: Principles and classroom practices (Vol. 10). White Plains, NY: Pearson Education.

A book which provides a thorough but accessible overview of foundational concepts in language testing

  • Hughes, A. (2003). Testing for language teachers (2nd edition). Cambridge: Cambridge University Press.

Handbook which explains the principles of backward design for classroom assessment

  • McTighe, J. & Wiggins, G. (2005). Understanding by design (2nd ed). Alexandria, VA: Association for Supervision and Curriculum Development.