Health open data bungle meant Aussies could be identified

By on
Health open data bungle meant Aussies could be identified

Privacy commissioner investigating.

Researchers from the University of Melbourne have been able to easily re-identify patients from confidential data released by the federal Health department, without using decryption methods.

Dr Chris Culnane, Dr Benjamin Rubinstein and Dr Vanessa Teague found that de-identified Australian Medicare benefits scheme (MBS) and pharmaceutical benefits scheme (PBS) claims data released to the public in August 2016 can be used to re-identify the patients involved.

The dataset included the de-identified medical billing records of 2.9 million people, or 10 percent of all Australians, from 1984 to 2014. It also included year of birth, gender, and medical events data.

It was published on the department's open data portal. Only supplier and patient IDs were encrypted.

The dataset was removed by the Health department in September 2016, just a month after it was published, after the same researchers pointed out that the practitioner details could be decrypted.

The highly-publicised breach spurred the government to attempt to legislate against individuals and businesses who re-identify open public sector data, wanting them to face up to two years jail and hefty fines.

The bill has stalled in the senate due to a lack of support from Labor and the Greens.

However, the researchers have now revealed that more supposedly de-identified information from the dataset can be re-identified.

They had notified the department about their findings in December 2016 but only today published their research, partially in order to give Health time to deal with what they had found.

An individual can be matched to their record simply by using known information about the person, like their year of birth and medical procedures, the researchers discovered.

They were able to match patient records to seven prominent Australians - including three former or current MPs and an AFL footballer - using publicly available information online.

The researchers said this re-identification was "straightforward for anyone with technical skills about the level of an undergraduate computing degree".

Rubinstein said that where this method was not successful, matching could still be achieved by cross-referencing the MBS and PBS records with other commercially available datasets like bank billing data.

While the researchers said access to high quality and occasionally sensitive data was a "modern necessity" for research, the problem of delivering this access while protecting people's privacy had not been solved.

“Open publication of de-identified records like health, census, tax or Centrelink data is bound to fail as it is trying to achieve two inconsistent aims: the protection of individual privacy and publication of detailed individual records,” Teague said.

“We need a much more controlled release in a secure research environment, as well as the ability to provide patients greater control and visibility over their data."

Teague told iTnews she was not concerned about the impact that publishing her team's research could have, should the government's re-identification bill be passed - the law would retroactively apply from September 29 2016, the day after the researchers informed Health of the first privacy issue with the dataset.

"I would like to think that it is now clear, even to those who initially supported it, that a law against demonstrating that the government messed up their maths is not going to improve the science nor permit the sort of open political discussion Australians are supposed to have," Teague said.

Privacy commissioner investigating

The Health department said it had referred the matter to privacy commissioner Timothy Pilgrim.

It said it was not aware of any individual being identified by the dataset outside of the researchers' efforts.

"This matter dates back to 2016 and since then the Australian government has taken further steps to protect and manage data," a spokesperson said.

"The project was halted and remains halted, and the dataset was removed immediately. The department is working with the University of Melbourne and has already acted to improve its processes."

Pilgrim's office said it has been investigating the matter since Health first notified the office of the potential privacy issues in September last year.

It declined to comment given the investigation is ongoing, but said Pilgrim would make a public statement once the probe was complete.

"Realising the value of public data to innovations that benefit the community at large is dependent on the public’s confidence that privacy is protected," a spokesperson said.

"The OAIC [Office of the Australian Information Commissioner] continues to work with Australian government agencies to enhance privacy protection in published datasets."

Got a news tip for our journalists? Share it with us anonymously here.
Copyright © . All rights reserved.

Most Read Articles

Log In

  |  Forgot your password?