The Australian Bureau of Statistics' willingness to play with large amounts of sensitive personal information gleaned via the Census could turn into a privacy nightmare the country’s never seen before.
Forget about the e-Census falling over on the night the population snapshot was meant to be taken: the data it is collecting could be traced right back to each individual citizen, despite promises of anonymity.
The government agency has gone for gold linkage using Australians’ names and addresses in order to extract the maximum value out of its data sets - in defiance of warnings from former chief statistician Bill McLennan that this would represent the "most significant invasion of privacy ever perpetrated on Australians by the ABS".
Australians had no choice but to hand over their names and addresses for usage by the ABS this year - the first time it has kept the details in order to link cross-government datasets - in the face of heavy fines and potential prosecution.
This policy change could turn out to be disastrous for everyone in the country.
Why? Because the statistical linkage keys (SLK-581) used by the ABS for the Census appear to be rather easy to decipher and identify the individual they relate to.
The keys are unique identifiers for all of us, used for a range of state services such as health, housing and early childhood records. They’re meant to follow Australians around for life while not actually revealing who’s behind the the seemingly random combination of numbers and letters.
But they're not secure by a long way. As managing director of Melbourne consultancy PivotNine Justin Warren points out, there have been multiple examples of anonymous data on individuals being re-identified over the past decade.
And thanks to hacker and activist Cameron Moon, that's been made a whole lot easier for the Census SLKs.
“It's really hard to anonymise individual level datasets, and those releasing this sort of data should know that by now," Warren said.
"I would go so far as to suggest that it's negligent, though of course that's something the courts will need to decide."
SLKs are a "shallow" privacy measure at best, according to Constellation Research principal analyst Steve Wilson.
"They make it hard for a casual human observer to tell who a record relates to. But that is not the privacy threat we need to worry about,” he said
“The larger privacy threat these days is re-identification of masses of data achieved by linking it to other masses of data" - such as with the one billion lines of claims dating back to 1984 just released by the Department of Health.
One of the ways breakable SLKs can be abused through re-identification is in human research: scientists wouldn't have to actually get someone's informed consent to use their data; a massive two-finger salute to university ethics committees.
More obviously, marketers, blackmailers, abusive ex-spouses, stalkers, and haters in general would also find the data useful.
“Witness what happened with the 1000 Genomes project in 2013. That project took DNA samples from anonymous donors and published it for scientific purposes," Wilson said.
"Researchers at MIT took the anonymous DNA, mashed it with publicly available family tree databases, and with clever new algorithms, managed to re-identify 12 percent of the men amongst the 1000.
"Data scientists are constantly inventing new ways to re-identity big data. There is no way that the ABS can keep its anonymity promise."
Clearly there are plenty of groups who will be motivated to put in the effort to compare the Census data with other linked information sets to re-identify individuals. And with cheap computing power so easily accessible, it won't be a laborious task.
Be prepared for a taxpayer-funded security and privacy mess of epic proportions over the next few years.
Was that really the purpose of the amazing digital e-Census of 2016?