Australia is one step closer to marking annual school NAPLAN tests using cognitive computing technology, according to the Australian Curriculum, Assessment and Reporting Authority.
It has unveiled the results of its first formal experiment using cognitive computing systems to mark the national diagnostic exams, finding the software can assess written responses just as consistently as a human.
The trial saw ACARA pit two different human markers against four products that automate the assessment process using cognitive computing algorithms to mark 339 test essays written by students in years three, five, seven and nine.
Each product was ‘trained’ to reverse engineer the linguistic and semantic features of an initial set of around 1000 already-marked writing exercises - such as sentence structure, cohesion, paragraphing and spelling - to replicate the scoring process performed by teachers hired as NAPLAN markers.
The results revealed that each of the automated solutions generated marks that were on average no more different to the human markers’ scores than the human markers were from each other.
ACARA has released the results of the test, saying that any differences between human and machine exam scores were “not statistically significant”.
“What is exciting about this research is that, although the four vendors had different automated essay scoring systems, they were all able to mark the essays as well as the human markers,” ACARA CEO Robert Randall said in a statement.
The four systems used were the Project Essays Grader from Measurement Incorporated, Intelligent Essay Assessor from Pearson, the Constructed-Response Automated Scoring Engine (CRASE) from from Pacific Metrics, and the Lexile Writing Analyser from MetaMetrics.
The agency plans to make a final decision about whether or not to go ahead with automated marking of NAPLAN written responses in 2017.
NAPLAN - the national assessment program for literacy and numeracy - is the annual diagnostic testing exercise run across all Australian students in years three, five, seven and nine.
It is intended to track their progress against reading, writing and mathematics benchmarks, and the results inform teaching strategies and are aggregated on the mySchool website.
Randall said the advantage of automated marking is that parents will be able to access their child’s marks within two weeks of sitting the exam, and the teachers can respond to any deficiencies in a student’s performance much faster.
ACARA plans to continue the testing program throughout 2016. It will re-run the same kinds of tests across different types of written responses and across larger sample sizes.
The agency is also looking at ways to test whether an awareness of software-based marking will affect the way students approach the exams, or even the way teachers prepare for them.
However, it still faces an uphill battle in gaining the confidence of some parents and teachers that cognitive computing will be able to capture the nuance and creativity of all responses.
It has already raised the ire of the NSW Teachers Federation, despite assurances that triggers will be built into the system to alert anomalous papers to be looked at by a human.
"If need be, we could double mark samples of student essays, until everyone is comfortable with automated essay scoring," Randall said.
“The research results show that automated essay scoring works for NAPLAN-type writing, but we will continue with our research to refine the system and to gather more evidence, which we will use to assure parents and teachers of the viability of automated essay scoring and to make a final decision about proceeding.”