Researchers have once again shown that sensitive data, supposedly anonymised so as not to reveal its subjects, can be re-constituted with relative ease.
Data scientists from London's Imperial College and the Université Catholique de Louvain in Belgium had a crack at estimating the likelihood of a specific person being correctly re-identified in even heavily incomplete, anonymised datasets.
Their Gaussian copula-based method turned out to be very accurate.
"Using our model, we find that 99.98 percent of Americans would be correctly re-identified in any dataset using 15 demographic attributes.
"Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymisation set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model," the researchers wrote in the Nature Communications scientific journal.
That it is possible to re-identify the people in anonymised data sets could lead to legal trouble for companies and organisations that use the information, for commercial purposes or for public good.
The general data protection regulation (GDPR) introduced by the European Union and the recent California consumer privacy act require that each and every person in data sets must be protected for the information to be considered anonymous, the researchers said.
Re-identifiable data sets could fall afoul of strict new privacy laws aimed at protecting individuals from having their sensitive, personal information being used against them.
Researchers have warned that the anynomisation of data sets traded to third parties or made public fall well short of what's required to protect people's privacy for several years.
In 2017, University of Melbourne (UoM) researchers Drs Chris Culnane, Benjamin Rubinstein and Vanessa Teague were able to easily re-identify patients in the Medicare and pharmaceutical benefits schemes in data sets released to the public in 2016.
Individuals can be matched to their records simply by using existing information such as their year of birth and medical procedures.
The UoM researchers were able to match patient records to seven prominent Australians that included three former or current members of parliament and an AFL footballer by using publicly available information online.