In education and assessment, we use the word “standards” in a number of ways: curriculum standards, standards-based assessments, performance standards. Performance standards—also known as proficiency levels, achievement levels, performance descriptors, and more—are one way we report assessment results, and have a direct influence on decisions that affect educators and students every day.
Many of us use and discuss these performance standards without knowing where they come from. Performance standards are first a policy initiative representing student expectations of proficiency for an assessment program, and then are uniquely defined for each content and grade level, after at least one year of operational administration. Standard setting is the process undertaken by education experts to relate test scores from an assessment program to pre-defined achievement levels.
Here we explain the three basic facets of standard setting: purpose, use, and process.
(Next page: Defining the way to set standards.)
1. Purpose of standard setting
The standard-setting process results in consistent matching of test-score information to levels of achievement that influence and are influenced by policy considerations and decisions made by assessment program leaders and educators. Well-defined achievement standards help get everyone on the same page to discuss important educational questions such as, “What aspirations do we have for our children?” and more tactical instruction and learning questions like, “What are reasonable and attainable goals for student achievement at a given grade, for the next grade, and for the one after that?” Standard setting enhances descriptions of student achievement beyond what score scales alone can provide.
2. Use of achievement-level descriptors
Once we define the achievement level “cut scores,” which divide the score scale into meaningful categories, we can derive thoughtful inferences about what students know and can do in a content area and grade. The descriptions are also designed to provide actionable information for use in classroom configuration and for taking next steps in student learning. Achievement level descriptors (ALDs) help inform next steps that best support struggling students to help them reach the standard of proficiency, and also suggest opportunities to challenge student to continue their learning progressions.
3. Standard-setting process
Standard setting occurs through a well-researched and systematic process. Educators and stakeholders with content, teaching, and testing expertise comprise the standard-setting team. For every grade and content area assessed, the team establishes “cut scores” that define the boundary scores for the achievement levels. Standard setting is typically facilitated by experts familiar with educational assessment and standard setting.
One key to a principled assessment system is to write ALDs at the beginning of the test-development process. The ALDs, created by teams similar to those that set standards, help guide the creation of a new assessment, because they summarize the knowledge, skills, and abilities expected of students at each achievement level.
While there are many methods for setting standards on achievement tests, the procedures followed in standard-setting workshops are similar across them. The four methods described below are widely used.
- Angoff method. In this method, content experts estimate the percentage of students (or probability that students) who are just barely in an achievement level would respond successfully to each item in the test. Variations of this method can be applied for multiple-choice and other selected-response items, and for constructed-response and other items that have multiple potential score points.
- Bookmark method. In this method, content experts determine cut scores using sequences of items on the test’s score scale. They consider which item in each sequence represents the last item that students who are just barely in an achievement level would get correct, estimating that these students have less than a probability (usually 0.67 or 0.50) of responding successfully to the subsequent items. This process can be applied for multiple-choice and other selected-response items, and for constructed-response and other items that have multiple potential score points. It requires that the items are placed on a single uniform scale to understand item difficulty.
- Item-Descriptor Matching method. In this method, which Steve Ferrara developed, content experts determine cut scores by identifying sequences of items on the test scale that match descriptors for basic, proficient, and advanced achievement levels. This method can be applied for multiple-choice and other selected-response items, and for constructed-response and other items that have multiple potential score points. Similar to Bookmark, it requires items to be placed on a single uniform scale.
- Body of Work method. This method differs from those above because it focuses on students rather than items. Content experts sort work products or recordings of student performances into categories defined by levels of performance such as basic, proficient, and advanced. This method was developed at Measured Progress to set standards for assessments that involve evaluating more extensive student work products and performances, such as portfolio assessments.
More information on standard setting
For information on how standards are set for a particular assessment program, contact the program’s staff and read the program’s technical report.
Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests, a somewhat non-technical book by Gregory Cizek and Michael B. Bunch, provides general information on standard-setting methods and processes.