The national statistical regulator is intervening to review the algorithm used by Ofqual to decide the A-level grades for students who could not take the exams.
One expert said the process was fundamentally flawed and the exam watchdog’s chosen algorithm essentially “cheated.”
Amid a public outcry, the government has decided not to use the data generated to determine student grades.
It raises questions about the oversight of algorithms used in society.
The results produced by the algorithm left many students unhappy, led to widespread protests, and were eventually abandoned by the government in favor of teacher-led evaluations.
The Office for Statistics Regulation (OSR) said it will now carry out an urgent review of Ofqual’s approach.
“The review will seek to highlight learning from the challenges faced in these unprecedented circumstances,” he said.
Tom Haines, a professor of machine learning at the University of Bath, studied the documentation released by Ofqual which outlines how the algorithm was designed.
“A lot of mistakes were made at many different levels. This included technical mistakes where the people implementing the concepts didn’t understand what the math they typed meant,” he said.
As part of the process, Ofqual tested 11 different algorithms, tasked them with predicting grades for the 2019 exams and comparing the predictions with actual results to see which produced the most accurate results.
But according to Mr. Haines: “They got it wrong and actually provided the algorithms with the 2019 results, so the algorithm they eventually selected was the one that was essentially the best at cheating.”
There was, he said, a need for far greater oversight of the process by which algorithms make decisions.
“A few hundred years ago, people put up a bridge and hoped it would work. We don’t do it anymore, we check, we validate. The same has to be true of algorithms. We are still behind those a few hundred years ago and we have to realize I realize that these algorithms are man-made artifacts and if we don’t look for problems there will be consequences. “
“Forbidden to speak”
In response, Ofqual told the BBC: “Throughout the process, we have set up an expert advisory group, which met them for the first time in early April.
“The group includes independent members from the statistical and evaluation communities. The advisory group provided advice, guidance, insight and expertise as we developed the details of our standardization approach.”
The Royal Statistical Society (RSS) had offered the assistance of two of its statisticians to Ofqual, President Stian Westlake told the BBC.
“Ofqual said they would only consider them if they signed a burdensome nondisclosure agreement that would effectively ban them from talking about anything they had learned from the trial for up to five years,” he said.
“Since transparency and openness are core values for RSS, we felt we couldn’t say yes.”
Ofqual’s president, Roger Taylor, is also chairman of the UK’s Center for Data Ethics and Innovation, a body set up by the government to provide oversight of government data use.
He confirmed to the BBC that he was not invited to review the algorithm or processes that led to its creation, stating that it was not his job to “check the algorithms of organizations”.
Haines said: “It appears that these bodies are created by companies and governments because they feel they should have them, but they are not given actual power.
“It is a symbolic gesture and we must realize that ethics is not something you apply at the end of any process, it is something you apply in everything”.
RSS welcomed the review of the OSR and said it hoped lessons would be learned from the fiasco.
“The process and the algorithm have been a failure,” Westlake said.
“There were technical flaws, but also the choices made when it was designed and the constructions it worked with.
“It had to balance degree inflation with individual injustice, and even if there was minimal degree inflation, there were a lot of people who were disappointed and it created an obvious sense of injustice.
“It’s not a statistical problem, it’s a choice of how to build the algorithm.”
Algorithms are used at all levels of society, from very basic to complex examples using artificial intelligence.
“Most algorithms are entirely reasonable, straightforward, and well-defined,” Haines said – but he cautioned that as they became more complex in design, the company had to stop and consider what it wanted from them.
“How do we handle the algorithms that are making decisions and don’t do the ones we assume will? How do we protect ourselves from this?”
And some things should never be left to an algorithm to determine, he said.
“No other country has done what we did with exams. Either they figured out how to take exams or they had essays for which they took middle school. Ultimately, the point of exams is that students determine their future and not. you can get it with an algorithm.
“Some problems just need a human being.”