Malware hidden in English language text

By Robert Blincoe

Dec 1 2009 12:13AM

How hackers could evade antivirus protection.

A team of US security researchers have engineered a way of hiding malware in sentences that read like English language spam.

The work is a breakthrough because current network security techniques work on the assumption that the code used in code-injection attacks, where it is delivered and run on victims’ machines, has a different structure to non-executable plain data, such as English prose.

One of the researchers, Dr Josh Mason of John Hopkins University, Baltimore, said the team wanted to broaden their understanding of how malicious code could be deployed, and highlight the need to design more efficient techniques for preventing this kind of attack altogether.

Dr Nicolas T Courtois, an expert in security and cryptology at University College London, said the work was an important paper in virusology, challenging an assumption that code has a different structure to non-executable plain data. He said malware deployed in this way would be “hard, if not impossible, to detect reliably.”

The research is a proof of concept, but Mason doubts any hackers are currently using the English language disguise technique for their code. “I'd be astounded if anyone is using this method in the real world owing to the amount of engineering it took to pull off,” he said. “A lot of people didn't think it could be done.”

Courtois says the paper has significant implications for technology companies, and argued that companies such as Intel should actually redesign their instruction set, to make this kind of attack easier to detect.

And Professor John Walker, managing director of forensics consultancy Secure-Bastion, argued the research highlights the flaws in the anti-virus community's approach to security exploits. “There is no doubt in my mind that anti-virus software as we know it today has gone well past its sell by date,” he said.

Walker consults for GCHQ and is sure hacking groups will be looking to leverage the technique.

The research paper, presented at the ACM Conference on Computer and Communications Security in Chicago, in November, is called English Shellcode - after the hacking community's generic name, shellcode, which refers to the payload portion of a code-injection attack.

This payload typically provides attackers with arbitrary control of system resources, applications, and data on a vulnerable machine. Attackers then choose how they want to continue their attack.

A tool that takes a piece of normal shellcode and generates some text to hide it could be the next step in the hacking and virus arms race. The advantage to hackers is simple. Alphanumeric shellcode can be stored in atypical and otherwise unsuspected contexts such as syntactically valid file and directory names or user passwords.

The challenge is that the alphanumeric character set is significantly smaller than the set of characters available in Unicode and UTF-8 encodings. This means that the set of instructions available for composing alphanumeric shellcode is relatively small. You couldn't have long strings of mostly capital letters, for example.

“There was really not a lot to suggest it could be done because of the restricted instruction set,” said Mason.

The team trained using English texts, roughly comprising 15,000 Wikipedia articles, and 27,000 books from the Project Gutenberg.

The team can now generate English shellcode in less than one hour on standard PC hardware with 4GB of RAM.

Below is an example of an automatically generated English encoding. The text in bold is the instruction set and the plain text is skipped.

“There is a major center of economic activity, such as Star Trek, including The Ed Sullivan Show. The former Soviet Union. International organization participation”

Mason said that with a lot of work the quality of the English prose could be improved, but wouldn't really be worth the effort involved.

Mason worked with Dr Sam Small of Johns Hopkins University, Dr Fabian Monrose of the University of North Carolina, and Greg MacManus of iSIGHT Partners.

The paper is available here (http://www.cs.jhu.edu/~sam/ccs243-mason.pdf )

Got a news tip for our journalists? Share it with us anonymously here.

itweek.co.uk @ 2010 Incisive Media

Tags:

be can english hidden in language malware say scientists security software text us

Partner Content

Promoted Content Overcoming the “last mile” problem in AI adoption with Lucid Software

Promoted Content ctrl:cyber strengthens sovereign cyber capability with elevenM acquisition

Promoted Content AI Goals for 2026: What Every Organisation Should Prioritise

Partner Content Empowering Sustainability: Schneider Electric's Commitment to Driving Customer Success

Events

Most Read Articles

Popular text editor Notepad++ was hacked to drop malware

Impact Awards: Tecala slashes customer response times for fintech IQumulate

Interactive introduces private cloud platform

Digital61 expands cybersecurity portfolio

Telstra-Accenture Data and AI joint venture to cut 209 positions

Department of Comms reveals text-based triple zero pilot is close

Westpac CIO to retire this year

Qld Child Safety IT system to be fixed within six months

BoM reveals plan to fix website within six months

Malware hidden in English language text