CSIRO’s digital arm Data61 has come up with a new way to automatically identify phishing attempts with a claimed higher success rate compared to current techniques.

Data61 teamed up with UNSW and the Cyber Security Cooperative Research Centre (CSCRC) to develop novel algorithmic techniques that use file compression to spot phishing activity.
“Previous phishing detection methods employed machine learning algorithms that used traditional classification techniques like logistic regression, support vector machines, decision trees and artificial neural networks,” Data61 research scientist Dr Arindam Pal said on the digital agency’s Algorithm blog.
“These algorithms can’t cope with the dynamic nature of phishing, which often sees fraudsters constantly change the design and hyperlink of an illicit site every few hours.”
As a result, existing methods to prevent attacks such as blacklists, content analysis platforms and web-based filters only provide limited protection before scammers develop new and more elaborate attacks - often faster than solutions can be designed to counteract them.
Pal said the new ‘PhishZip’ system uses lossless DEFLATE file compression algorithm to compress both legitimate and phishing sites, separating them by examining how much they get compressed.
“Legitimate and phishing websites have different compression ratios.
“We then introduce a systematic process of selecting meaningful words which are associated with phishing and non-phishing websites and analyse the likelihood of those word occurrences, therefore calculating the optimal likelihood threshold.
“These words are then used as the pre-defined dictionary for our compression models and used to train the algorithm into identifying instances where a proliferation of these key words indicates a malicious website.”
PhishZip has an advantage over machine-learning based models in that it doesn’t need model training or HTML parsing, where HTML code extracts information from webpages such as titles and headings.
The PhishZip algorithm was used on several phishing websites which are clones of PayPal, Facebook, Microsoft, ING Direct and other popular sites, correctly identifying 83 percent of phishing sites, which Data61 said is a marked improvement on current methods.
The researchers were also able to use the platform to contribute comprehensive phishing datasets to PhishTank, a community run by OpenDNS for people to share, verify and track phishing data.
The Australian Competition and Consumer Commission’s Scamwatch has received over 16,000 reports of phishing scams so far this year, totalling almost $600,000 in losses.
The CSIRO said there had been a significant increase in phishing activity over the last decade, with the outbreak of COVID-19 and resulting shift to working from home leading to even more instances.
“The technology could ultimately prevent significant financial losses for individuals and organisations,” Pal added.
Those interested in early access to the PhishZip project can contact Data61 here.