
Eddie Sheehy is CEO of forensics firm Nuix.
The era of big data is placing investigators under increasing pressure as the data involved in investigations continually grows, becomes more complex and is stored in more places within an organisation.
This is stretching resources to capacity for internal investigators, data forensics specialists and information security managers. Everyone wants answers faster but traditional methods for evaluating electronic evidence are becoming unsustainable.
As a rule of thumb, the number of devices containing data involved in a typical investigation doubles every two years. In corporate environments, evidence can be stored in file shares, email databases, email archives and document management systems, to name a few.
These repositories have intricate ways of storing and embedding data multiple levels deep. They often use closed, proprietary formats that typically require a vendor-supplied software interface to read the information within them.
Despite these challenges, many investigators still insist on using traditional methods. They analyse each data repository using forensic tools then manually try to correlate the evidence they uncover in each.
Because this approach takes a long time, investigators routinely turn away potential evidence because they lack the resources to examine it.
In addition, human sleuths, however brilliant, can’t hope to consistently and accurately cross-reference and find correlations across millions of data points. It is too easy to miss connections.
A more efficient approach
In recent years, we have seen investigators from the corporate and law-enforcement sides of the fence take a more efficient approach, known as content-based forensic triage.
This method involves collecting all available data in a single storage location, then using a combination of data management, analytical and forensic techniques to focus on the most critical evidence sources until the key facts emerge. I have seen this approach achieve similar or superior results when compared with traditional investigation methods, but in much less time.
Using this process, an investigator would first ingest all data sources into a single repository, then conduct a light metadata scan to tabulate information such as the sender, size, format and subject line of an email.
Using techniques such as network diagrams and timelines, investigators can already see connections and relationships between people and evidence.
Having identified the most likely evidence sources, investigators can then extract full text and metadata and with the use of advanced investigative tools.
They can use pattern matching and regular expression searches to extract and highlight intelligence items – including names, email addresses, IP addresses, credit card numbers, bank account numbers and amounts of money.
Cross-referencing this intelligence across all available evidence can rapidly reveal relationships between people and entities, deliver points to prove and also offer broader intelligence. It brings to light connections that human investigators might miss.
Content-based forensic triage in practice
We were involved in an investigation at a large company, which had identified its call centre as the source of leaked credit card data, but could not locate the weakness in the system.
Using our software to analyse the email patterns of employees who had access to credit cards, we quickly identified an employee who had sent dozens of images as a blind carbon copy (BCC) to an external email address at the end of each day.
These images turned out to be scanned photocopies of credit cards and identity documents, which the company required customers to send in for verification. The employee was emailing those details to an associate, who would then compromise the cards.
In another instance, a government agency investigated a company fraudulently selling aircrafts that didn’t exist. The agency had seized approximately 40 devices including desktop and laptop computers and smartphones and recognised that it would need a team of up to 20 investigators to examine the available data using traditional methods.
Investigating each device sequentially would have made it impossible to locate links between different custodians and purchases.
Instead, the agency ingested all available data into a single storage location used content-based forensic triage to index and cross-reference it. This meant a single investigator could quickly identify the most critical evidence, enabling the agency to bring charges.
In addition, by using near-duplicate functions to find similar documents, the investigator brought to light a series of related companies – unknown to the agency – conducting fraudulent transactions for aircraft parts, boats and other high-value products.
As the challenges facing investigators evolve, so must the approach they use and the tools they apply to support it.
With the huge volumes of data involved in today’s investigations, traditional analysis methods are no longer sustainable. The content-based forensic triage approach ensures investigators don’t overlook critical evidence and helps them find answers faster.