The Department of Health has removed a research dataset based on Medicare and PBS claims from its open data portal after a team of Melbourne researchers pointed out that practitioner details could be decrypted.

The government today advised that the data was withdrawn yesterday following “an alert made in the public interest” by researcher Dr Vanessa Teague from Melbourne University on September 12.
Teague told the department that she and her colleagues had analysed 10 percent of the linked dataset and found it was possible to decrypt some of the service provider ID numbers attached to doctors.
“As a result of the potential to extract some doctor and other service provider ID numbers, the Department of Health immediately removed the dataset from the website to ensure the security and integrity of the data is maintained,” the agency said in a statement.
Health made the dataset available in August for the benefit of health and policy researchers looking at patterns of demand in the medical products and services consumed by Australians.
It includes some 30 years worth of de-identified claims made against the Medicare and Pharmaceutical Benefits Scheme, believed to reach into a billion lines of data. It doesn’t contain any names and addresses of service providers.
The department insisted “no patient information has been compromised, and no information about the health service providers has been publicly identified or released”.
“The Department of Health is undertaking a full, independent audit of the process of compiling, reviewing and publishing this data and this dataset will only be restored when concerns about its potential vulnerabilities are resolved,” it said.
The incident has also been reported to the Office of the Australian Information Commissioner, with Privacy Commissioner Timothy Pilgrim confirming he had commenced an investigation into the breach.
"The primary purpose of the investigation is to assess whether any personal information has been compromised or is at risk of compromise, and to assess the adequacy of the Department of Health’s processes for de-identifying information for publication," Pilgrim said in a statement.
The news comes less than 24 hours after Attorney-General George Brandis announced plans to amend the Privacy Act to criminalise the re-identification of de-identified datasets.
The Privacy Commissioner has described proper de-identification of data - which has proved fallible many times in the past - as being akin to “rocket science”.
Updated 12:45 pm: Teague told iTnews that while the encryption used to protect practitioner details in the open database was not best practice, she had agreed to not to provide details of her team's decryption method while copies of the dataset could still be circulating.
"There are plenty of good, well studied algorithms out there for encrypting data securely - this wasn't one of them," she said.
In a case study, the researchers said they were able to perform a cryptographic attack by utilising partial details about the linkable encryption algorithm used in the dataset that was described on data.gov.au.
"Although neither the exact algorithm nor the details of subsequent processing were described in detail, we could guess those details for provider IDs and use the dataset to check our hypothesis," her team wrote.
"We were able to decrypt every service provider ID in the MBS dataset.
"Leaving out some of the algorithmic details didn’t keep the data secure – if we can reverse-engineer the details in a few days, then there is a risk that others could do so too."
What the researchers were left with after decryption was the numerical code assigned to each medical service provider by the government. They could not crack the encryption on patient data.
Teague pointed out that all health consumers are able to see their own doctor's practitioner number by looking up their personal Medicare claim history.
She acknowledged, however, that linking the information on a larger scale to doctor's names and other details would be an "extra step - but not necessarily a very hard step".
"What the government needs to do is use a well studied public algorithm and make clear statements in advance about what type of encryption mechanisms they have in place, so there can be a scientific examination of those techniques before they are applied to real data," she said.
The team would not fall foul of the government's planned criminalisation of data re-identification as they notified the department a fortnight ago.