When can you say you have aggregated and obfuscated a dataset so much that it is no longer personally identifiable?
And which datasets have been de-identified to an acceptable point to make them shareable and useable without attracting the wrath of the Privacy Act?
One of the biggest headaches facing data professionals right now, according to NSW’s chief data scientist Dr Ian Opperman, is that pretty much every organisation you deal with will have a different answer to these questions.
Opperman is leading the state government’s Data Analytics Centre, set up to broker data swaps to answer public policy's most tricky conundrums.
But he says that "every company we work with, every agency we work with, everybody treats this question differently”.
“There is no nationally accepted test for the presence of personally identifiable information.”
Opperman is on a campaign to bring a bit more scientific rigour to the ad-hoc process of deciding what can and cannot be shared in the data sphere.
“We ask agencies if we can have their data and they say ‘no, it’s personal’,” he said told Sydney’s CDO Summit last week.
“So we say what if we aggregate it over three months? No. Six months? No. 12 months? No. 24 months? Maybe.
“You have to ask where do we cross the line?”
And it’s not a frustration unique to the DAC, with Standards Australia, the ACS, Mastercard, IBM, Telstra and representatives from the Commonwealth, NSW, Victoria, and Queensland governments all joining forces to form a data taskforce dedicated to nutting out this issue.
The group met last week in Opperman’s Sydney offices, as they incrementaly work towards a more consistent system.
“We are trying to chew on a few problems which fundamentally get down to: where do we get to the point where data becomes personalised, and how can we create service types based on different sorts of data, which allow us to understand our limitations, obligations and responsibilities,” he said.
“Once we crack that we are off to a brand new future where we can do some extraordinarily powerful things and use the power of that data to address truly wicked challenges.”