Should student test scores be used to evaluate teachers?

Teachers who lead students to achievement gains in one year or in one class tend to do so in other years and other classes, the report said.

The so-called value-added model is an “imperfect, but still informative” measure of teacher effectiveness, especially when it is combined with other measures, according to the preliminary results of a large-scale study funded by the Bill and Melinda Gates Foundation. The study’s early findings have ratcheted up the debate over whether student test scores should be used in evaluating teachers—and if so, how.

The report, entitled “Learning About Teaching: Initial Findings from the Measures of Effective Teaching Project,” reportedly gives the strongest evidence to date of the validity of the value-added model as a tool to measure teacher effectiveness.

The $45-million Measures of Effective Teaching (MET) Project  began in the fall of 2009 with the goal of building “fair and reliable systems for teacher observation and feedback.”

Teacher quality is important, but “teacher evaluation is a perfunctory exercise,” the report said. Principals tend to go through the “motions” of evaluation, and “all teachers receive the same ‘satisfactory’ rating.” The study aims to fix this “neglect” and help devise a system that gives teachers the feedback they need to grow, the report said.

What is the value-added model?

Value-added is a controversial statistical method that relies on test-score data to determine a teacher’s effectiveness. Each student’s performance on past standardized tests is used to predict how he or she will perform in the future. Any difference between the student’s projected result and how the student actually scores is the estimated “value” that the teacher has added or subtracted during the year.

The value-added model is thought to bring objectivity to teacher evaluations, because it compares students to themselves over time and largely controls for influences outside teachers’ control, such as poverty and parental involvement.

For more on teacher evaluation, read:

Teacher quality under the microscope

Are qualified teachers always effective teachers?

Newspaper’s teacher ratings stir up controversy

Video to be a key part of student teacher evaluation

Teacher’s death exposes tensions in Los Angeles

Value-added has been a buzz word among educators since the Obama administration’s “Race to the Top” grant program began promising money to school systems that adopt to certain requirements, such as evaluating teachers’ performance by using factors like student achievement.

Critics of the value-added model fear school leaders might make serious decisions about individual teachers based on these projections alone.

“This is a problem with value-added,” said Raegen T. Miller, associate director for education research at the Center for American Progress. “So far, value-added has been on its own. People are very scared that administrators would start making serious decisions about individual teachers just based on that information—and nobody thinks that should be done. It doesn’t take away people’s fear of it. We can write all we want that we should use multiple measures; now, we actually [should] start having multiple measures.”

That’s the good news that comes from the preliminary findings of the MET Project. Based on these findings, researchers recommend that school leaders use multiple measures, in addition to value-added, to evaluate teachers effectiveness.

“I think people who are using test scores and using them wisely include value-added estimates with other measures of evaluations,” said Bruce Hunter, associate executive director of the American Association of School Administrators.

“If that’s what they are promoting, using a wide range of factors, and having the test score be part of—but not half, or more than half; [instead,] a more minor part—I don’t know why anybody would object.”

The preliminary results of the Gates Foundation research sound fair and reasonable, Hunter said: “We have to improve the assessments. We have to improve the observations, but we have to get the right mix of factors.”

More about the study

Nearly 3,000 teachers from six urban school districts volunteered for the study. The participating districts are Charlotte-Mecklenburg Schools in North Carolina, Dallas Independent School District, Denver Public Schools, Florida’s Hillsborough County Public Schools, Memphis City Schools, and the New York City Department of Education.

For more on teacher evaluation, read:

Teacher quality under the microscope

Are qualified teachers always effective teachers?

Newspaper’s teacher ratings stir up controversy

Video to be a key part of student teacher evaluation

Teacher’s death exposes tensions in Los Angeles

Researchers choose districts that already had state testing and three supplemental tests in place: Stanford 9 Open-Ended Reading Assessment in grades four through eight; Balanced Assessment in Mathematics in grades four through eight; and the ACT Quality Core series for Algebra I, English 9, and Biology.

Over a  two-year period, researchers are collecting and analyzing the following measures of teacher effectiveness:

  1. Student achievement gains on state assessments;
  2. Supplemental assessments designed to test higher-order conceptual understandings;
  3. Classroom observations;
  4. Teacher reflections on their practice;
  5. Assessments of teachers pedagogical content knowledge;
  6. Student perceptions of classroom instructional environment; and
  7. Teachers perceptions of working conditions and instructional support at their schools.

For classroom observations, the MET Project will observe 20,000 lessons via digital video. So far, 13,000 lessons have been recorded.

Early findings

The first report details the findings from the first year of the study using two measures, specifically in math and language-arts test scores in grades four to eight from five of the six participating school districts and student perception data.

The preliminary report outlines four general findings.

First, in every grade and subject studied, a teacher’s past success in raising student achievement on state tests (that is, his or her value added) is one of the strongest predictors of his or her ability to do so again. Teachers who lead students to achievement gains in one year or in one class tend to do so in other years and other classes, the report said.

Admittedly, the value-added model has “volatility.” Reasons for instability from year to year could include factors size as significant differences in class size from year to year, an influenza outbreak, a group of disruptive students, construction noise during testing, and so on.

“Value-added methods have been criticized as being too imprecise, since they depend on the performance of a limited number of students in each classroom. Indeed, we do find that a teacher’s value-added [result] fluctuates from year to year and from class to class, as succeeding cohorts of students move through his or her classrooms. However, our analysis shows that volatility is not so large as to  undercut the usefulness of value-added as an indicator of future performance,” the policy brief for the report said.

Second, the teachers with the highest value-added scores on state tests also tend to help students develop a deeper conceptual understanding as well. “We see evidence that teachers with high value-added on state tests also seem to help students perform better on the supplemental tests. This seems particularly true in mathematics,” the policy brief said.

In many classrooms, students reported spending a great deal of time preparing for state tests. “The teachers in such classrooms rarely show the highest value-added on state tests,” the policy brief said.

Third, the average student knows effective teaching when he or she experiences it. When collected appropriately, student perceptions of a teacher correlate to the teacher’s value-added estimates.

“When students report positive classroom experiences, those classrooms tend to achieve greater learning gains, and other classrooms taught by the same teacher appear to do so as well,” the policy brief said.

“Students’ perceptions have two other welcome characteristics: They provide a potentially important measure that can be used in non-tested grades and subjects. In addition, the information received by the teacher is more specific and actionable than value-added scores or test results alone,” it added.

Fourth, valid feedback need not be limited to test scores alone. By combining different sources of data, it is possible to provide diagnostic, targeted feedback to teachers who are eager to improve.

“The public discussion usually portrays only two options: the status quo (where there is no meaningful feedback for teachers) and a seemingly extreme world in which tests scores alone determine a teacher’s fate. Our results suggest that’s a false choice. It is possible to combine measures from different sources to get a more complete picture of teaching practice,” the policy brief said.

“Value-added scores alone, while important, do not recommend specific ways for teachers to improve,” it concluded.

Action steps for school leaders

“Reinventing the way we develop and evaluate teachers will require a thorough culture change in our schools,” the policy brief said. The researchers recommend that school leaders begin:

  1. Working with teachers to develop accurate lists of the students in their care, so that value-added data are as accurate as possible;
  2. Using confidential surveys to collect student feedback on specific aspects of a teacher’s practice, including those in non-tested grades and subjects;
  3. Retraining those who do classroom observations to provide more meaningful feedback; and
  4. Regularly checking that the measures they use allow them to explain the variation in student achievement gains among teachers.

“The best way to ensure that the evaluation system is providing valid and reliable feedback to teachers is to regularly verify that—on average—those who shine in their evaluations are producing larger student achievement gains,” the policy brief said.

The MET Project plans to release its next analyses in the spring and summer, followed by the final results next winter.

[poll id=”9″]


Measures of Effective Teaching Project

Link to the Report: “Learning about Teaching: Initial Findings from the Measures of Effective Teaching Project” (PDF)

Sign up for our K-12 newsletter

Newsletter: Innovations in K12 Education
By submitting your information, you agree to our Terms & Conditions and Privacy Policy.

eSchool News Staff

Want to share a great resource? Let us know at

Comments are closed.

IT SchoolLeadership

Your source for IT solutions and innovations to support school-wide success.
Weekly on Wednesday.

  • Hidden
  • Hidden
  • Please enter your work email address.
  • Hidden
  • Hidden
  • Hidden
  • Hidden
  • Hidden
  • Hidden
  • Hidden
  • Hidden
  • Hidden
  • Hidden
  • Hidden
  • Hidden
  • Hidden
  • This field is for validation purposes and should be left unchanged.

eSchool News uses cookies to improve your experience. Visit our Privacy Policy for more information.