Module 7: Resources (screen 4 of 5)
Key Assessment Terms (Glossary)

A . C . D . E . F . H . I . M . N . O . P . R . S . T . V . W



Accountability Responsibility for educational outcomes;  these outcomes are often measured through standardized testing.
Achievement test A test that measures how well a student has reached the objectives of a specific course or program
ACTFL proficiency levels Guidelines developed by the American Council on the Teaching of Foreign Languages (ACTFL) that describe language performance
Alternative assessment Non-traditional forms of assessment;  may include portfolios, observations, work samples, or group projects 
Analytic scoring Method of scoring or rating that assigns separate scores for different aspects of a student's performance 
Aptitude test Test which measures a student's talent for learning language;  predicts future performance
Assessment The process of gathering information
Assessment literacy Knowledge about and a thorough understanding of myriad assessment practices, especially by educators
Authenticity How well a test reflects real-life situations
Cloze test Test that measures comprehension by asking students to fill in missing words from a passage
Computer-adaptive test Computer-based test that adapts to the test-taker's performance and presents easier or more difficult tasks based on previous answers
Construct    What a test measures
Construct validity How well a test measures what it is supposed to measure
Content validity How well the content of a test reflects the construct that the test is measuring
Criterion-referenced Scores interpreted with respect to standards or a theory of language;  everyone can get a high score.
Cutoff score On a criterion-referenced test, the minimum score a student must receive to demonstrate a determined level
Direct testing Testing method that closely matches the construct being measured
Discrete test Test focused on specific language skills
Diagnostic test Test that identifies a student's strengths and weaknesses
Evaluation Making decisions based on the results of assessment
Face validity Non-technical term that refers to how fair, reasonable and authentic people perceive a test to be
Formative assessment An assessment used during the course of instruction to provide feedback to the teacher and learner about the learner's progress toward desired educational outcomes;  the results of formative assessments are often used in planning subsequent instruction.
High-stakes test Assessment that is used to make critical decisions with consequences for one or more stakeholders in the assessment process;  an admissions test that determines the course of a student's academic future and a test used for accountability and linked to funding are both examples of high-stakes tests.
Holistic scoring Method of rating an assessment based on general descriptions of performance at specified levels;  while a holistic scoring rubric may take into account performance along several dimensions (e.g., fluency, grammatical accuracy, and word choice for oral language), one overall score which best represents the examinee's performance is assigned.
Impact The positive or negative effects of testing
Indirect testing A method of testing that measures abilities related to the construct being tested, rather than the construct itself
Input The materials (presented aurally and visually) that an examinee receives as part of the test tasks
Integrative test Test that addresses multiple language skills, sometimes in the same task
Multiple choice test Test in which examinees demonstrate knowledge, skill, or ability by selecting a response from a list of possible answers
Needs assessment Inquiry into the current state of knowledge, resources, or practice with the intent of taking action, making a decision, or providing a service with the results
Norm-referenced Scores interpreted with respect to other examinees; some must score high, some low.
Off-the-shelf Commercially-available test which can be purchased by an educational institution or individual user and administered at the discretion of the individual user
Parallel forms Two or more tests with different questions that measure the same underlying skill and whose difficulty levels have been determined to be equivalent;  scores from parallel versions of a test can be compared with one another.
Percentile Range of measures from 1-99 used to compare examinees with one another;  an examinee who scored in the 80th percentile placed higher than 80% of test takers.
Performance assessment Assessment which requires the examinee to demonstrate knowledge or skill through activities that are often direct, active, and hands-on, such as giving a speech, performing a skit, or producing an artistic product
Placement test Test whose results are used to assign students to classes designed for learners at a particular level
Practicality Feasibility of test given materials, funding, time, expertise, and staff
Proficiency test Test of ability in a defined area of language;  the area may be narrowly-defined (e.g., English for airline pilots) or more broad (e.g., social and academic language).  Proficiency tests are not tied to a specific curriculum or course and are often contrasted with achievement tests.
Program evaluation Process of collecting data from multiple sources about an instructional program or intervention and making a decision about the success of the program based on this information;  the evaluation could target both the process and outcomes of the program.

Raw score

Student's total number of correct responses on a test
Reliability Consistency of scores/results
Scale score Score that allows test results to be compared across students;  in standardized testing, raw scores are often converted to scale scores.
Scoring method Describes how scoring is accomplished (e.g., machine-scored, hand-scored, centrally scored, locally scored)
Scoring process Describes the procedures used to obtain a test score, e.g., counting the number correct, scoring holistically or analytically according to established guidelines, a scale, or a rubric
Self-assessment Personal rating of language ability according to specified criteria
Skills test Test focusing on a specific domain of language use, e.g., listening, reading, writing or speaking (interactive or presentational)
Stakeholders Persons involved with or invested in the testing process, e.g. test takers, administrators, parents, and teachers/instructors
Standardized test Test with fixed content, equivalent parallel forms, standard administration and scoring, field-tested, valid, and reliable
Subscore Score that represents student performance in a particular domain or part of a test
Summative assessment Outcome-based use of assessments, often for decisions such as grading, program evaluation, tracking, or accountability
Test accommodation “Any change to a test or testing situation that addresses a unique need of the student but does not alter the construct being measured” (Center for Equity and Excellence in Education, 2006)
Test administration Delivery of the test items/directions to the test-takers
Test construct Based on a theory of language, a general, overarching theory that informs the test development process
Test development

Process of creating a test;  steps of test development (Hughes, 2003):

1. State the goals of the test.

2. Write test specifications.

3. Write and revise items.

4. Try items with native speakers and accept/reject items.

5. Pilot with non-native speakers with similar backgrounds as the intended test-takers.

6. Analyze the trials and make necessary revisions.

7. Calibrate scales.

8. Validate.

9. Write test administrator handbook, test materials.

10. Train staff as appropriate.

Test format Mode and organization of test, test structure (e.g., multiple choice, short answer)
Test items Tasks, questions or prompts to which test-takers respond
Test materials Items used for the test administration/taking
Test purpose What you want to learn from the test results
Testing Valid and reliable practice of language measurement for context-specific purposes
Validity Validity is a judgment about whether a test is appropriate for a specific group and purpose and includes considerations such as whether the test really measures what you think it is measuring, whether the results are similar to examinees' performance on other tests or in class or real-world activities, and whether the use of test results have the intended effects. 
Washback Effects of test on teachers’ and students’ actions;  washback can be positive (expected) or negative (unexpected, harmful).