iTnews
  • Home
  • News
  • Technology
  • Security

Linguistics identifies anonymous users

By Darren Pauli on Jan 9, 2013 9:49AM
Linguistics identifies anonymous users
Linked gallery: In pictures: Chaos Communications Congress 29C3

Researchers reveal carders, hackers on underground forums.

Up to 80 percent of certain anonymous underground forum users can be identified using linguistics, researchers say.

The techniques compare user posts to track them across forums and could even unveil authors of thesis papers or blogs who had taken to underground networks. 

Aylin Caliskan Islam (left); Sadia Afroz (right)
Aylin Caliskan Islam (left);
Sadia Afroz (right)

"If our dataset contains 100 users we can at least identify 80 of them," researcher Sadia Afroz told an audience at the 29C3 Chaos Communication Congress in Germany.

"Function words are very specific to the writer. Even if you are writing a thesis, you'll probably use the same function words in chat messages.

"Even if your text is not clean, your writing style can give you away." 

The analysis techniques could also reveal botnet owners, malware tool authors and provide insight into the size and scope of underground markets, making the research appealing to law enforcement.

To achieve their results the researchers used techniques including stylometric analysis, the authorship attribution framework Jstylo, and Latent Dirichlet allocation which can distinguish a conversation on stolen credit cards from one on exploit-writing, and similarly help identify interesting people.

The analysis was applied across millions of posts from tens of thousands of users of a series of multilingual underground websites including thebadhackerz.com, blackhatpalace.com, www.carders.cc, free-hack.com, hackel1te.info, hack-sector.forumh.net, rootwarez.org, L33tcrew.org and antichat.ru.

It found up to 300 distinct discussion topics in the forums, with some of the most popular being carding, encryption services, password cracking and blackhat search engine optimisation tools. 

While successful, the work faces a series of challenges. Analysis could only be performed using a minimum of 5000 words (this research used the "gold standard" of 6500 words) which culled the list of potential targets from tens of thousands to mere hundreds. 

It also needs to separate discussion on product information like credit cards, exploits and drugs from conversational text in order to facilitate machine learning to automate the process, according to researcher Aylin Caliskan Islam.

And posts must be translated to English, a process which boosted author identification from 66 to around 80 per cent but was imperfect using freely available tools like Google and Bing.

However both of these tasks were performed successfully, and further development including the use of "exclusive" language translation tools would only serve to boost the identification accuracy.

Leetspeak, an alternative alphabet popular in some forum circles, cannot be translated.

The project is ongoing and future work promises to increase the capacity to unmask users. This Islam said would include temporal information which would exploit users who logged into forums from the same IP addresses and wrote posts at around the same time.

Antichat user analysis

"They might finish work, come home and log in," Islam said.

It could also tie user identities to the topics they write about and produce a map of their interactions, identify multiple accounts held by a single author, and combine forum messages with internet relay chat (IRC) data sets.

"We want to automate the whole process."

Afroz said while the work appeals to law enforcements and government agencies, it is not designed to catch users out.

"We aren't trying to identify users, we are trying to show them that this is possible," she said.

To this end, the researchers released tools last year, updated last December, which help users to anonymise their writing.

One tool, Anonymouth, takes a 500 word sample of a user's writing to identify unique features such as function words which could make them identifiable.

The other, JStylo, is the machine learning engine which powers Anonymouth.

The Drexel and George Mason universities research team is composed of Sadia Afroz, Aylin Caliskan Islam, Ariel Stolerman, Rachel Greenstadt, and Damon McCoy.

Got a news tip for our journalists? Share it with us anonymously here.

Copyright © SC Magazine, Australia

Tags:
29c3chaos communication congressresearchsecurity

Partner Content

Accenture and Google Cloud team up to create a loveable, Australian-first, renewable energy product
Promoted Content Accenture and Google Cloud team up to create a loveable, Australian-first, renewable energy product
Security: Understanding the fundamentals of governance, risk & compliance
Promoted Content Security: Understanding the fundamentals of governance, risk & compliance
Avoiding CAPEX by making on-premise IT more cloud-like
Promoted Content Avoiding CAPEX by making on-premise IT more cloud-like
Security "mindset shift" needed to protect organisations
Promoted Content Security "mindset shift" needed to protect organisations

Sponsored Whitepapers

Extracting the value of data using Unified Observability
Extracting the value of data using Unified Observability
Planning before the breach: You can’t protect what you can’t see
Planning before the breach: You can’t protect what you can’t see
Beyond FTP: Securing and Managing File Transfers
Beyond FTP: Securing and Managing File Transfers
NextGen Security Operations: A Roadmap for the Future
NextGen Security Operations: A Roadmap for the Future
Video: Watch Juniper talk about its Aston Martin partnership
Video: Watch Juniper talk about its Aston Martin partnership

Events

  • Micro Focus Information Management & Governance (IM&G) Forum 2022
  • CRN Channel Meets: CyberSecurity Live Event
  • IoT Insights: Secure By Design for manufacturing
  • Cyber Security for Government Summit
  • Forrester Technology & Innovation Asia Pacific 2022
By Darren Pauli
Jan 9 2013
9:49AM
0 Comments

Related Articles

  • UTS to create secure research hub at Tech Central
  • Apple appeals against security research firm Corellium
  • Poor patching creates easy zero-day vulnerability reuse
  • Atlassian patches Jira server plugin to fix vulnerability
Share on Twitter Share on Facebook Share on LinkedIn Share on Whatsapp Email A Friend

Most Read Articles

Qantas calls time on IBM, Fujitsu in tech modernisation

Qantas calls time on IBM, Fujitsu in tech modernisation

Researchers hacked Oracle servers to demo serious vulnerability

Researchers hacked Oracle servers to demo serious vulnerability

PayTo rollout kicks off

PayTo rollout kicks off

Australian scientists build world's first quantum computer IC

Australian scientists build world's first quantum computer IC

Digital Nation

The security threat of quantum computing
The security threat of quantum computing
Integrity, ethics and board decisions in the digital age
Integrity, ethics and board decisions in the digital age
COVER STORY: Operationalising net zero through the power of IoT
COVER STORY: Operationalising net zero through the power of IoT
Crypto experts optimistic about future of Bitcoin: Block
Crypto experts optimistic about future of Bitcoin: Block
IBM global chief data officer on the rise of the number crunchers
IBM global chief data officer on the rise of the number crunchers
All rights reserved. This material may not be published, broadcast, rewritten or redistributed in any form without prior authorisation.
Your use of this website constitutes acceptance of nextmedia's Privacy Policy and Terms & Conditions.