Queensland government agencies have been urged to review all published data and identify datasets containing de-identified data, after the de-identification practices of two unnamed agencies were found to be lacking.
The state’s Office of the Information Commissioner (OIC) made the recommendation in a report this week that said there was a “real risk of re-identification” in several public datasets at one of the agencies.
The report, titled ‘Privacy and public data: Managing re-identification risk’, revealed that three of the four public de-identified datasets belonging to the agency that were examined were at “significant risk of re-identification”.
The OIC assessed two of the four datasets as having “medium to high risk” of re-identification, which could disclose the personal information of individuals in breach of the state’s Information Privacy Act 2009.
One of these datasets “contains de-identified information about vulnerable individuals that access a particular government service”, the risk analysis conducted with the assistance of CSIRO’s Data61 said.
“There are only a small number of attributes with unique value. However, when combining two attributes, a significant number of entities are unique,” the report states.
“These attributes are approximate information about the individuals address, and the precise date they accessed the government service.
“On combination of these attributes, an overwhelming 84 percent of entities in this dataset are unique.”
The other audited agency “had relatively low risk scores” on all four datasets in comparison, and used “de-identification techniques to effectively reduce the risk of re-identification to generally low levels”.
Neither agency, however, was found to monitor and review re-identification risk in the examined datasets, meaning risk management strategies could be outdated.
The agencies were also unable to “consistently demonstrate how it developed de-identification techniques and managed re-identification risk in all four datasets”.
Much of this can be attributed to the finding that “neither agency has appropriate governance arrangements to regularly monitor and review re-identification risk in de-identified datasets”.
“Without these arrangements, neither agency can be confident that risk management strategies remain effective over time,” the report states.
While the OIC found both of the agencies to have detailed governance arrangements for public data, only one included adequate guidance to assist the release of de-identified data.
“The other agency’s guidance is not sufficient to support effective re-identification risk management. As a result, its governance arrangements are not adequate to manage the privacy risks of de-identified data,” the report states.
The OIC has not named the audited agencies to “protect the privacy of individuals with personal information in the examined datasets”.
The agencies were chosen “through a risk assessment that considered the volume and sensitivity of released data”.
In light of the findings, the OIC has recommended that all of the state’s government agencies “review all published data and identify datasets containing de-identified data”.
It also provided specific advice for agencies that publish de-identified data, including the two audited agencies which have both accepted the recommendations.
These include “assigning a custodian to each published de-identified dataset” and maintaining a dedicated data register for de-identified datasets, as well as implementing and maintaining policies and procedures to govern de-identified data releases.
The OIC also asks that the agencies “monitor the external data environment and the effectiveness of risk treatments, and regularly review existing de-identified datasets”.