Academics get personal over big data

Scholars at Princeton University have delivered a stinging rebuke to the 'big data' movement, insisting that today's data de-identification tools are not sufficient to ensure privacy.

Assistant Professor Arvind Narayanan and Professor Edward Felten have published an academic paper titled 'No silver bullet: De-identification still doesn't work [pdf]’ which ridicules the research methodologies of a paper published last month by ITIF researcher Daniel Castro and Ontario privacy commissioner Ann Cavoukian, which had concluded the opposite.

Marketers across the globe are building tools to take advantage of the large volumes of data created by web sites and the use of digital devices such as smartphones, with half an eye on a new generation of embedded digital devices in everything from automobiles to consumer goods.

Steady improvements in computer processing power and distributed file systems have helped marketers gain far richer insights into large data sets at far greater speed.

While most agree that big data tools provide economic and social value, theorists are split on whether there are sufficient legal and technical frameworks available to ensure that insights can be drawn from large, aggregated sets of data without impeding on an individual’s right to privacy.

Data de-identification, the storing and sharing of data in such a way that the identity of any one individual can’t be ascertained from the broader data set, has become one of the latest battlegrounds on the subject.

Castro and Cavoukian’s June paper concluded that the risk of re-identification of an individual from a de-identified data set has been “greatly exaggerated”, inflamed by researchers that haven’t used tools effectively and blown out of proportion by the media.

“Contrary to what misleading headlines and pronouncements in the media almost regularly suggest, datasets containing personal information may be de-identified in a manner that minimises the risk of re-identification, often while maintaining a high level of data quality."

- Big Data and Innovation: Setting the Record Straight: De-identification does work [pdf]

The authors argued that much of this research had either failed to provide enough proof that individuals could be re-identified from the data sets queried; had matched the data with third party data sets to achieve their aims; or had applied specialist knowledge that isn’t readily available in the ‘real world’. They were concerned that policy makers may feel compelled to regulate in ways that reduce the utility of these data sets.

"While data is typically collected for a single purpose, increasingly it is the many different secondary uses of the data wherein tremendous economic and social value lies. For example, recent studies have shown that large-scale mobile phone data can help city planners and engineers better understand traffic patterns and thus design road networks that will minimize congestion. De-identifying the data is one way to enable its reuse by third parties.

While it is not possible to guarantee that de-identification will work 100 percent of the time, it remains an essential tool that will drastically reduce the risk of personal information being used or disclosed for unauthorized or malicious purposes."

- Big Data and Innovation: Setting the Record Straight: De-identification does work [pdf]

In reply, Narayanan and Felten methodically pulled apart this defence of the efficacy of big data de-identification, listing eight problems with the June paper and insisting that data de-identification is ‘no silver bullet’ to ensuring privacy.

"There is no evidence that de-identification works either in theory or in practice and attempts to quantify its efficacy are unscientific and promote a false sense of security by assuming unrealistic, artificially constrained models of what an adversary might do."

- No silver bullet: De-identification still doesn't work [pdf]

The Princeton scholars’ paper listed many examples of where a motivated actor could readily combine aggregated data sets from mobile network connections or web site clicks with information easy to obtain from elsewhere to identify an individual.

It further squashed the notion that only a handful of individuals have the tools to re-identify data, citing the tens of millions of people qualified in software development as more than capable.

"Most “anonymized” datasets require no more skill than programming and basic statistics to de- anonymize."

- No silver bullet: De-identification still doesn't work [pdf]

The scholars contend that organisations need to invest their efforts in emerging techniques, such as differential privacy, and be prepared to make some trade-offs in utility and convenience in the interests of privacy.

In the absence of better alternatives, they argue that policy makers may have to “use legal agreements to limit the flow and use of sensitive data.”

Australian organisations are subject to the Privacy Act, which compels organisations to gain consent from users and be explicit about what reasons they are collecting PII data at the point of collection.

The regulatory function set-up to regulate these activities, however, was part of the Office of the Information Commissioner, which has been disbanded by the Abbott Government. With few resources available, Australia’s Privacy Commissioner has yet to wield new powers granted under amendments to the Privacy Act introduced in March.

Telstra broke its network with undocumented time fix

Notorious 1990s hacker held over €100 million-a-month fraud ring

Colonial First State names data and AI group executive

Russian spies hunt for routers running legacy protocols with default credentials

Qantas escapes formal OAIC probe over 2025 vishing breach

Academics get personal over big data

De-identification the latest privacy battleground.

Partner Content

Sponsored Whitepapers

Events

Most Read Articles

JB Hi-Fi Group finds new cyber security leader

Services Australia describes fraud, debt-related machine learning use cases

Toll Group puts third-party risk at centre of AI-era data security

How Monash University is tackling the AI-driven app security gap

Most popular tech stories

ABC drops Salesforce for Braze

Alinta Energy spins up AI system for executive insights

Chemist Warehouse's AI tool for HR becoming a "standard pattern"

Virgin Australia, Wesfarmers strike OpenAI agreements

Bendigo Bank scoping more than 3000 ideas for AI use

HamiltonJet partners with digital services provider Fortude

SentinelOne signs distribution agreement with Sektor

Rapid7’s new SIEM combines exposure management with threat detection

The techpartner.news podcast, episode 3: Why security consultancy founder Kat McCrabb started with the hard stuff

Bluechip Infotech enters final stage of Goodson Imports acquisition

Blackberry celebrates "giant step forward"

'Touch-free' smartphone controlled with head movements

Photos: Australian industry explores data for net zero

Telstra Purple acquires IoT specialists Alliance Automation, Aqura Technologies

Govt launches consumer tech label program for smart devices

Telstra broke its network with undocumented time fix

Notorious 1990s hacker held over €100 million-a-month fraud ring

Colonial First State names data and AI group executive

Russian spies hunt for routers running legacy protocols with default credentials

Qantas escapes formal OAIC probe over 2025 vishing breach

Academics get personal over big data

De-identification the latest privacy battleground.

Add iTnews as your trusted source

Partner Content

Sponsored Whitepapers

Events

Most Read Articles

JB Hi-Fi Group finds new cyber security leader

Services Australia describes fraud, debt-related machine learning use cases

Toll Group puts third-party risk at centre of AI-era data security

How Monash University is tackling the AI-driven app security gap

Most popular tech stories

ABC drops Salesforce for Braze

Alinta Energy spins up AI system for executive insights

Chemist Warehouse's AI tool for HR becoming a "standard pattern"

Virgin Australia, Wesfarmers strike OpenAI agreements

Bendigo Bank scoping more than 3000 ideas for AI use

HamiltonJet partners with digital services provider Fortude

SentinelOne signs distribution agreement with Sektor

Rapid7’s new SIEM combines exposure management with threat detection

The techpartner.news podcast, episode 3: Why security consultancy founder Kat McCrabb started with the hard stuff

Bluechip Infotech enters final stage of Goodson Imports acquisition

Blackberry celebrates "giant step forward"

'Touch-free' smartphone controlled with head movements

Photos: Australian industry explores data for net zero

Telstra Purple acquires IoT specialists Alliance Automation, Aqura Technologies

Govt launches consumer tech label program for smart devices

Log In