How the Crime Commission is taking on unstructured data

By on
How the Crime Commission is taking on unstructured data

ACID system under review.

The Australian Crime Commission is nearing the end of efforts to create a comprehensive picture of national criminal intelligence through better use of data, and will soon come to a decision about the future of its central criminal intelligence database.

The ACC is responsible for managing around 3.5 million documents contained with the national criminal intelligence database ACID and the law enforcement intelligence network (ALEIN). 

ACID sits within ALEIN and operates as a single intelligence database used by 20 law enforcement agencies country-wide as an information repository and analysis tool.

But its functionality has been limited within the ACC itself. The Commission is undertaking a scoping study which will determine its future, to be finalised at the end of this year. 

The Commission also utilises an internal network built by the agency's intelligence staff  - the Fusion network - which holds around 1000 datasets collected by the agency. The Fusion initiative was announced in 2010 and given $14.5 million in funding until 2014.

Fusion is powered by a platform created by US software vendor Palantir — favoured by US law enforcement agencies — and sits atop a number of disparate systems and datasets that previously weren’t connected.

Fusion has effectively replaced a solution which ACC CIO Dr Maria Milosavljevic described as a database modelled and transformed into a “sort of data warehouse”.

“Unlike Customs and Immigration — which are dealing with people doing transactional operations —we don’t generate any information ourselves," she told iTnews.

"All the information we use is what we collect so we have no control over it, and the variety of information is enormous."

Milosavljevic said new datasets had to be cleansed and structured before being loaded into the data warehouse. Efforts to structure the datasets led to classic information management dilemma - both delays before the data would be usable, and difficulties agreeing on a common taxonomy without fundamentally changing the raw data.

“We found that the waiting time to load data into the warehouse was huge," she said, noting that the "rate the backlog was growing had doubled over five months".

“Why are we adding structure to this information when doing so actually corrupts it? We couldn’t get the information in the system fast enough to have an effect.”

To speed up the process, the Commission shifted strategy: instead of adding structure to the data, the agency now simply adds meaning.

“You can represent a person in a database in many, many different ways, but in the end they’re all just a person," she said. "The things that matter most are those that identify - a person’s name, address and date of birth are the most crucial to get right. You don’t want to get that search wrong and link data inappropriately,” Milosavljevic said.

“What we were doing was forcing all the square pegs and oval pegs and star-shaped pegs into a diamond hole. What we've [since] decided was that it doesn’t matter - they’re all pegs, just find me all the pegs that have something in common. The data doesn’t need to be in all the same format.”

The ACC IT team built a search assistant into the Fusion platform to allow analysts to sift through the data more quickly.

The Commission has been able to realise benefits from the system faster than expected -  and Milosavljevic credits the agile methodology used by the ACC's software developers.

“We adopted an agile approach and had some fairly honest conversations with analysts," she said.

"We said, ‘We are doing this R&D and we’re going to put things out and they may not be perfect. If you want them, please help us to test and get them ready’.”

Dealing with ACID

The Crime Commission is now looking at how best to mine the unstructured data that sits within the 3.5 million ACID records.

Most of the datasets are text, and the ACC is currently working with researchers from the University of Sydney for text analytics mining to get the most of out the data.

“Text is unbelievably difficult. It brings an enormous number of challenges. If you use Google to search something, how many times do you re-enter the phrase to try and pin down what you’re trying to find? It’s because what you’re dealing with is primarily unstructured data. That’s why data mining for text analytics is hard.”

From next year, the Crime Commission's technology projects are expected to be led by a new CIO, and the newly appointed CTO Narelle Lovett.

Milosavljevic is acting in the role until a permanent appointment is made.

The agency advertised for the position in mid-September.

Got a news tip for our journalists? Share it with us anonymously here.
Copyright © . All rights reserved.

Most Read Articles

Log In

  |  Forgot your password?