"Mass. ponders hiring a computer to grade MCAS essays. What could go wrong?" by James Vaznis Globe Staff May 21, 2018
In an effort to speed up the delivery of MCAS results to schools and families, the state Department of Elementary and Secondary Education is exploring the idea of replacing human test scorers with a computer program.
That’s right: Students would be producing a response that may never be read by anyone, denying them the chance to stir a reader’s emotion, draw a laugh, or sway an opinion. Instead, the responses would be processed by an algorithm that evaluates such elements as sentence structure, word choice, and length.
That's more and more of the morning paper with me these days.
“When used responsibly and carefully, automated essay scoring is faster and cheaper and can even exceed the validity of human scores,” said Jon Cohen, executive vice president at American Institutes for Research and president of AIR Assessment, which provides automated essay scoring in standardized testing.
There is an element of subjectivity in judging writing — what appeals to one person might not appeal as strongly to another, but automated scoring has drawn many vocal detractors. Les Perelman, a researcher and retired Massachusetts Institute of Technology writing professor, argues that automated scoring systems are not only inaccurate but are detrimental to writing instruction. In an era of teaching to the test, he said, teachers will drill strategies into their students to game the computer programs to get higher scores.
To test just how unreliable automated scoring systems can be, Perelman and a small group of MIT and Harvard graduate students four years ago generated an essay of gibberish that included obscure words and complex sentences and ran it through a computerized scoring system used for a graduate school admission exam. The nonsensical essay achieved a high score — on the first try.
In another study, Perelman tested the accuracy of an automated grammar check with a famous speech by Noam Chomsky, “The Responsibility of Intellectuals.” It erroneously identified several grammatical errors.
Across the nation, interest in automated scoring has grown as states have been moving their testing systems from paper booklets to the Internet, opening the possibility to expand computerized scoring beyond multiple choice questions.
Few states, though, have adopted the technology, according to testing experts, as the practice remains highly controversial.
Earlier this year, Ohio education officials faced a public backlash after they revealed they had quietly implemented automated scoring for student writing on standardized tests. The issue came to light after several districts spotted irregularities in the results, according to media reports.
Fully aware of the skepticism surrounding the technology, Utah treaded carefully when it adopted automated scoring as part of a new standardized testing system during the 2014-15 school year.
Cydnee Carter, Utah’s assessment development coordinator, noted that about 20 percent of the essays continue to be reviewed by people. Utah has instituted other measures to ensure quality control. There have been some glitches, however, she said.
Some students also were attempting to game the computer program by writing one spectacular paragraph and then repeating it over and over again. The computer program now picks up on that.
Transparency, she said, is key.
“Giving teachers and students an opportunity to understand how the essays are being scored ultimately helps students become skilled writers,” Carter said. “We don’t want students gaming the system and not show what they know.”
Massachusetts officials have been toying with the idea of automated scoring of essays since at least 2016. A work group examining changes to the MCAS exams, which consisted of state administrators and local educators, was split on the idea and recommended sharing information with local school systems about it, according to a summary of their discussion.
If Massachusetts pursues automated scoring, state officials said they would likely tap human beings to spot-check the results.
“Personally I can’t imagine going into any automation that doesn’t have multiple levels of backup and quality control,” said Jeff Wulfson, deputy education commissioner.....
Did you see who failed?