Fewer than 2 percent of the studies of software and other curriculum programs that have been evaluated for their effectiveness meet the federal government’s standards for demonstrating clear scientific evidence of success, according to a recent announcement.

Though educators might conclude from this news that better math software is needed in the nation’s classrooms, that’s not necessarily the case: Some observers familiar with the evaluation process say the federal standard for “scientific” proof is too stringent, and what’s worse, the way findings are presented could mislead educators into thinking that certain programs don’t really work when they might be effective after all.

The announcement about math studies came from the What Works Clearinghouse (WWC), part of the U.S. Department of Education’s (ED’s) Institute of Education Sciences. WWC was founded in 2002 to provide an online repository of reliable education research, giving teachers, administrators, and other school stakeholders scientific proof that the pedagogical approaches tapped for use in their classrooms would help bolster student achievement in accordance with the goals of the federal No Child Left Behind Act (NCLB).

But since NCLB’s inception, educators have wrestled with the law’s most fundamental principles, including its scientifically based research (SBR) provision, which calls for every educational approach deployed in the nation’s classrooms to be proven effective by way of indisputable scientific evidence. While some have struggled to understand what, exactly, constitutes SBR, others have complained that the body of research currently available doesn’t measure up to the standards proposed under the law.

Their frustrations were confirmed last month when the clearinghouse offered one of its first glimpses into how federal officials would interpret the SBR portion of the law, stating that only a small number of the nation’s middle school mathematics curricula exhibit scientific evidence of effectiveness. After reviewing more than 800 studies conducted by educators on a bevy of educational products for mathematics instruction in grades six through nine, WWC researchers found just 11 studies that met the program’s rigorous criteria.

Though some educators have applauded the WWC for helping to weed out those studies that don’t make the grade, others have criticized the online clearinghouse for its approach, saying it gives stakeholders the wrong impression: Just because a study of a program doesn’t make the grade at the WWC doesn’t mean the program is a failure in the classroom, they say. Instead, what it might mean is that the body of research submitted doesn’t measure up to WWC standards.

Clearinghouse officials don’t deny that distinction. In an interview with eSchool News, WWC project director Rebecca Herman said the clearinghouse is not in the business of endorsing products. Its goal is to determine whether the research exhibits indisputable scientific evidence of a program’s effectiveness in schools. In other words, she said, the WWC doesn’t evaluate the curricula; instead, it looks at the methodology exercised by researchers to test the products.

“A lack of scientific evidence does not mean that the product itself is ineffective,” Herman explained.

As a general rule, the organization favors studies based on experimental design, meaning the programs submitted stand the best chance of being approved if their corresponding evaluations include evidence of randomized field trials, where students are placed into control and experimental groups. One group is subject to the intervention, while the other is not. Researchers say experimental design is one of the best ways to eliminate variables in the test pool.

Steve Ritter, vice president and senior cognitive scientist for Pittsburgh-based Carnegie Learning, is among a select few whose research studies have been deemed effective by the clearinghouse so far.

Ritter’s study, which evaluates the effectiveness of Carnegie Learning’s Cognitive Tutor, a combination software and hands-on problem-solving tool, is the sixth such analysis performed and submitted by researchers in support of the product. So far, it’s the only one to have received WWC’s stamp of approval, represented on its web site by a double checkmark.

Stephane Baldi, project coordinator for WWC’s Middle School Math Review and principal research scientist at the American Institutes of Research, said there is good reason these checkmarks are so hard to come by.

According to Baldi, every research-based study submitted to the WWC is subjected to a rigorous approval process consisting of three separate reports. The first involves several hours of coding performed by WWC research consultants to determine if the product meets the clearinghouse’s evidenced-based protocol, as well as a written summary detailing the study’s findings, the study’s rating in relation to the standards, and a list of the program’s strengths and weaknesses as determined by researchers. The time frame for this initial acceptance process is two days or more, depending on the size of the report, he said.

If the study is deemed thorough enough and its results indicate a pattern of effectiveness that can be attributed solely to the intervention, the WWC review team also must create what’s known as an intervention report. This report, intended for educators who visit the clearinghouse in hopes of finding a possible solution for their schools, provides key findings from the study, including a description of the intervention, as well as program details and information about the educational philosophy behind the product. Each intervention study also is linked to the other reports completed by the WWC research team during the initial evaluation phase.

The last phase of the reporting process involves the creation of a topic report, which describes the type of skill the intervention is intended to improve–math, reading, character education, and so on–and the means by which each product accomplishes that goal. Each report contains a compilation of the various intervention reports already completed, as well as a description of the evaluation process performed by the WWC and a list of features available as part of the intervention.

Considering the process that’s involved, Ritter said the WWC has set its standards high–higher, probably, than most people had anticipated. But, he added, there’s a reason for this.

The studies, he explained, have to be of such high quality there can be no question in educators’ minds whether the intervention is responsible for demonstrated gains in student achievement. The idea is to eliminate any variables to whatever extent possible, until all that’s left is the program itself and its effect on learning.

Ritter compared the work just now beginning at the WWC to the type of research that has been conducted for years in the medical field. Instead of relying on theories to produce results, he said, doctors rely on testing and scientific evidence as means to prescribe medicinal cures. With the advent of SBR, he said, teachers will have that ability, too.

But not everyone agrees with the strict criteria the WWC uses to evaluate the submissions it receives.

This past summer, Michael Pressley, director of doctoral programs in education at Michigan State University, criticized the clearinghouse even after it approved a study he co-wrote on the practice of reciprocal teaching.

In an article in Education Week, Pressley said he was unhappy with the way researchers at the WWC categorized his study. Researchers reportedly classified it as a peer-tutoring evaluation instead of its intended focus: reciprocal teaching. Though tutoring represented a small portion of the overall approach, he said, it wasn’t the only aspect of the study.

Pressley also told the newspaper he was unhappy with the WWC’s rejection process. While it lists those interventions that were roundly rejected, he said, it doesn’t provide any explanation for why certain evaluations failed to pass muster.

Carnegie Learning’s Ritter said he agrees with WWC’s critics that the presentation of the research studies listed on the site is somewhat misleading.

Chief among Ritter’s concerns is the way the WWC labels the different studies. For instance, interventions deemed evidence-based are listed on the web site with two checkmarks, while interventions deemed effective with reservations receive a single check. Conversely, interventions that fail to meet the criteria are labeled with a red X.

The problem, says Ritter, is that when educators visit the site, the X will likely be seen as an indication that the intervention, or product, featured does not improve student achievement.

But that’s not always the case, he said. Again, just because the research submitted on the intervention falls short of WWC standards doesn’t mean the product itself is ineffective–though at first glance it might appear that way.

“Some aspects of the presentation are going to be confusing to educators,” said Ritter, who said he has raised his concerns to administrators at the clearinghouse before.

Ritter also criticized the WWC for contributing to a false notion that scientific research is only valuable if it meets the most rigorous standards.

Just as there is a place in education for large, randomized field tests, he said, there also is a place for more acute studies. Though randomized field tests might be the metric of choice when deciding whether a particular intervention stands a chance of bolstering student achievement across an entire grade level, Ritter said, lesser studies can be used to answer less ambitious questions.

Unlike evidence in the medical field, the majority of research performed in education is inherently theoretical in nature. Despite NCLB’s affinity for scientific evidence, Ritter contends, there is still a place for more exploratory kinds of thinking in schools.

“The pendulum has swung too far in one direction,” said Ritter, who compared the studies approved by WWC to those that appear in The New England Journal of Medicine, long considered a leader among medical journals. If the WWC’s intention is to champion only the highest quality research, he said, then there should be a place where lesser studies also can be published. Because of the cost and time it takes to complete most randomized field tests–more than a year, in most cases–“it’s unreasonable to expect every study to be that way,” he said.

Instead of simply accepting or denying the various studies outright, Ritter suggested the WWC should provide some type of guidance to independent researchers as to how they might improve their individual studies and resubmit them for acceptance.

Clearinghouse officials, meanwhile, say they have no immediate plans to change their process for reviewing submissions–though they reportedly are working on ways to make the experience easier on researchers and educators who seek the WWC’s coveted stamp of approval.

Additional resources now under development include an evaluation registry, where educators and service providers can turn to find qualified researchers who can help them evaluate their different programs and prepare them for submission to the clearinghouse.

WWC administrators also plan to set up a help desk that stakeholders can contact to submit any questions they have about the SBR provision or ED’s evaluation process in general.

Like the math study released in November, WWC plans to release similar reports in the future dealing with the status of SBR for reading and character-education programs.


What Works Clearinghouse