Your teachers’ bleary-eyed nights of reading stacks of essays on “How I spent my summer vacation” might soon be over, thanks to an essay-grading software program.
Relying on a technology called “latent semantic analysis,” the software program allows a computer to grade student essays. It soon might be available to schools, but many educators question whether evaluating essays, in which students must synthesize their knowledge of a specific topic, is an appropriate use of today’s technology.
The software, called Intelligent Essay Assessor (IEA), uses mathematical analysis to measure the quality of knowledge expressed in writing. The program was developed by Thomas Landauer, a psychology professor at the University of Colorado at Boulder (CU-Boulder) and debuted at the American Educational Research Association (AERA) annual meeting on April 16.
“One of our goals is to have the instructor spend more time teaching and the students writing more essays,” Landauer said.
Although essay tests provide a better assessment of a student’s knowledge than other types of exams, he noted, they are often time-consuming and difficult to grade fairly and accurately, especially for teachers with large classes or for nationally administered exams.
Landauer has worked on the technology behind the program for 10 years, along with CU-Boulder doctoral candidate Darrell Laham and New Mexico State University psychology professor Peter Foltz.
IEA’s developers see the program as a way to fit more written work into a student’s evaluation, instead of relying on term-recognition methods such as multiple-choice tests. Teachers with large classes could use the software to supplement their grading of hundreds of essays and thereby ease their workload.
How the program works
The program does more than just count words or analyze mechanics and grammar, the way earlier essay scoring applications did. Laham said his program can look at large chunks of text and determine the similarity between them.
The technology behind the software is a new type of artificial intelligence much like a neural network. “In a sense, it tries to mimic the function of the human brain,” Laham said.
First, the program is fed information about a topic from online textbooks or other sources. It “learns” from the text and then assigns a mathematical degree of similarity between the meaning of each word and any other word. This allows students to use different words that mean the same thing and receive the same score–words such as “physician” and “doctor,” for example.
Next, the teacher grades enough essays to provide a statistical sample of the range from good to bad examples–say, 30 to 40 out of a total of 100 essays, Laham said. The computer then can grade the rest.
“It takes the combination of words in the student essay and computes its similarity to the combination of words in the comparison essays,” Laham said.
Laham said that in test after test, the program showed the same range of consistency between a human grader and a computer as there was between two different human graders. “The program has perfect consistency in grading–an attribute that human graders almost never have,” Laham said. “The system does not get bored, rushed, sleepy, impatient, or forgetful.”
But skeptics persist
Though it’s proven consistent in trial runs, the program already has drawn its share of critics. Writing, they say, should teach communication skills between people.
Mary Burgan, executive director of the American Association of University Professors, says the program misses the point of having students write essays.
“I think it’s a terrible idea. Education is not about spewing back information but assimilating it into language,” Burgan told the Associated Press.
Richard Anderson, professor of educational psychology at the University of Illinois at Champaign, said, “I feel [the program’s creators] are doing a good job, but I have a general worry that these types of systems will have unintended consequences in terms of how students prepare [for essay exams].”
Anderson served as a discussant for an AERA presentation of the software. Although the program can perform a sophisticated form of vocabulary matching, he said, its main problem is that it cannot analyze syntactical relationships–the ‘who does what to whom.’
For example, if a student were asked to write an essay about the New Deal and its impact on the Great Depression, the program could look to see if all the relevant terms were there: Franklin Roosevelt, “alphabet agencies,” 1930s, unemployment, Tennessee Valley Authority, and so on. But the program might not be able to distinguish between an essay with the statement, “In 1933, Roosevelt signed the TVA Act” and one with the statement, “In 1933, Roosevelt vetoed the TVA Act.”
The software can assess whether students are writing in typical English sentences, thereby preventing students from simply listing key concepts without tying them together in essay form. It can also recognize when students are straying too far from the topic at hand, and it directs such essays to the teacher’s attention for a closer review.
But Anderson said he thought the system could be beaten once students put their minds to it–and he feared they would.
“[IEA’s makers] need to do a field trial outside of the ‘friendly’ environment of their schools and their students to look for any unintended side effects,” Anderson said.
Bring on the beta testing
That’s precisely what Landauer and his colleagues plan to do. They have applied for a patent and are now seeking reactions from other educators.
Laham acknowledged the program’s shortcomings. “This isn’t meant to replace creative writing or term-paper grading,” he said. Nor is it meant to evaluate language or rhetorical skills. Rather, the software works best when used to measure content knowledge derived from short-answer, directed responses.
Studying still pays off
Landauer sees backing up or checking a human grader’s evaluation as another application of the software. For example, the program could be used in a situation where two or three people are normally used to ensure that students receive a fair score, as on a final exam or a national test.
What about the argument that the system can be beaten on an exam? “We’ve tried to write bad essays and get good grades, and we can sometimes do it if we know the material really well,” Landauer said. “The easiest way to cheat this system is to study hard, know the material, and write a good essay.”
University of Colorado at Boulder
New Mexico State University
American Association of University Professors
American Educational Research Association
University of Illinois at Champaign