Secret identities of phishers, spammers and online bullies could be exposed by a newly published data-mining technique that provides evidence to the courts.

The method, which was recently accepted to the peer-reviewed journal Digital Investigation, identified patterns in vocabulary, punctuation and spelling to infer the gender, nationality and educational background of a message’s author.
It identified authors with an accuracy of 80 to 90 percent in a study of 100 emails from the now-defunct Enron Corporation, researchers from Canada’s Concordia University reported.
Study author Benjamin Fung said it could help authorities narrow their search for cybercriminals and identify a perpetrator from suspects.
It was based on Fung’s previous work into grouping and analysing emails from the same author to extract identifying patterns and generate a “write-print”.
Write-prints were distinctive identifiers, like fingerprints, and could be used for comparing criminal emails to any writing samples obtained via law enforcement warrants.
Although experts might avoid generating identifiable write-prints, Fung expected most cybercriminals to be prone to subconscious clues such as typographical errors and style.
He said it should be combined with IP tracing to strengthen law enforcement capabilities.
“My method cannot replace an IP address,” he told iTnews, explaining that write-print comparisons would be particularly useful if emails were traced to a location that housed multiple people.
Fung described highly accurate pattern recognition methods such as the Support Vector Machine (pdf) as “black box” methods that relied on multi-dimensional modelling and were too complex to be meaningful in courts of law.
By contrast, the Concordia technique was designed to match sets of data and reasons that could presented to, and understood by, judicial authorities.
“For evidence to be admissible, investigators need to explain how they have reached their conclusions. Our method allows them to do this,” he said.
Researchers will extend write-printing to chat logs and SMS.
Fung hoped it would be used by law enforcement in the “near future”, noting that his research group had worked with Canada’s National Cyber-Forensics and Training Alliance.