After spending hours taking standardized tests, students often have to wait months to receive score results. What inevitably takes the longest to grade are the essays. Pearson education company estimates that human graders, working at their fastest, may be able to grade about 30 writing samples an hour. New grading software, capable of scoring 16,000 essays in 20 seconds, may just be the answer to this problem.
Mark Shermis, dean of the College of Education at the University of Akron, conducted a study to test the accuracy of robo-graders. He collected over 16,000 middle and high school test essays from six states that had already been graded by humans and then used automated systems to score those same essays. According to a university news release, the results showed “virtually identical levels of accuracy, with the software in some cases proving to be more reliable.”
However, an article in the NY Times by Michael Winerip disproves the results of the study. Les Perelman, a director of writing at MIT taught himself to think like e-Rater, which allowed him to find many ways to manipulate the system. Perelman said that “the automated reader can be easily gamed, is vulnerable to test prep, sets a very limited and rigid standard for what good writing is, and will pressure teachers to dumb down writing instruction.” Perelman wrote several essays to test the system, two of which were in response to a question asking the essay writer to discuss why college costs are so high. Since the e-Rater cannot check facts and prefers longer essays with big words, Perelman wrote a 716-word essay filled with random sentences and accusations against greedy teaching assistants, receiving a top score of 6. However, a shorter well-argued essay only received a 5. You can read his top-scoring essay here.
What do you think? Will robo-graders ever have the same appreciation for and understanding of human language and writing that human graders do? Or are they simply tools that supplement human grading?