The Australian Taxation Office is actively trialling large multimodal AI models to help audit taxpayer-submitted documents as part of a broader push to “industrialise” AI across the agency by 2030.

The technology builds on the ATO’s existing use of open source AI, which for the past five years has been used to automatically read, classify and summarise supporting documents during audits.
But now, the office is preparing to extend those capabilities to non-text formats, including images, using emerging multimodal systems, according to ATO assistant commissioner for data science Ying Yang.
Speaking at the AI Innovation Showcase in Canberra, she said: “Why large multimodal instead of large language? The documents are not always in plain language.
"It can be images as well, so the team is right now playing with large multimodal models to see whether it will increase the performance of the document understanding.”
The ATO first piloted its document understanding tool in 2021 to help case officers sift through work-related expense claims after a taxpayer is selected for audit.
These cases are then verified by an ATO case worker, who will handle around 25 audits per year with an average of 147 pages of documents to assess.
The tool became fully operational in May 2024 [pdf] and works by extracting and prioritising key information from documents, breaking them down and then ranking them by their relevance to the client labels that are being identified.
According to ATO assistant commissioner Julia Webb, who also spoke at the AI Innovation Showcase, the tool was developed using an “enterprise learning loop”, a process of continuous human feedback to improve the AI’s performance.
“The machine is constantly learning and understanding why something might help,” Webb said.
"When a case worker directly requests to access the document, then they're able to provide feedback again, or whether it's relevant or whether it's not.”
After completing the document review, case officers can then review the summaries of allowed claims to compare them with the original claims.
These details inform an audit finalisation letter that is then issued to the taxpayer.
“After that, a case officer completes what we call an intelligence survey on the audit,” Webb said.
“[Details] include a client’s behaviour, the audit’s outcome, any unique claims made and any audit decisions.
“The intelligence is used to feed in and inform the machine learning function, providing insights into behavioural observations to continually evaluate and improve our selection of tactics as well.”
Before reaching the audit stage, the ATO already makes extensive use of AI to assess the likelihood that work-related expense claims may be non-compliant, flagging potential issues before a return is selected for manual review.
“We have models running... to have a quick browse of their numbers and call out if there are any abnormal numbers and give [the taxpayer] a friendly nudge,” Yang said.
Examples of this, she said, include if someone’s claim is higher than average for people in “similar circumstances”
“Once we receive the tax return, we have risk models running in the background to call out inputs or claims that are deemed high risk at least by machines at first," she said.
However, Yang added, it’s only after a claim is reviewed by a case officer that a taxpayer is asked to provide supporting documentation.
"Impactful and scalable”
The document understanding tool and learning loop form part of the ATO’s five “very high value” enterprise use cases in its AI strategy, which was first outlined in 2022.
At the centre of this is a goal that by 2030, “the ATO is a leader in industrialising ethical, impactful and scalable AI solutions”, Yang said.
As part of this, the ATO intends to use AI for “client profiling”, which Yang described as creating an “integrated single source of truth for each of our clients.”
Currently, she said, different teams within the ATO use their own models to determine a person’s risk profile.
“How do we consolidate all these different perspectives of risk for a client so that everyone coming to the picture will have a consolidated understanding [of them]?” she said.
“That's the current focus of this enterprise use case.
Another key use case, meanwhile, is known as “fraud disruption”, which Yang described as being a “very high priority” for the government.
While she didn’t provide details, the reference comes in the wake of revelations that more than $550 million in fraudulent claims were made over the two previous financial years by exploiting loopholes in the government’s digital identity systems.
The last use case is for helping ATO workers understand different laws and precedents using AI.
“The AI can start to understand how those decisions were made,” Yang added.
Yang noted that the ATO’s strategy deliberately omitted any mention of specific technologies or AI services, although as of June 2024, it had eight publicly available generative AI technologies approved for use.
These were: Microsoft Copilot; GitHub Copilot Visual Studio 2022 Extension for Business; Code Llama; Llama 2; Adobe Creative Cloud; OpenAI ChatGPT Team; IBM Cloud IaaS and Copilot for Microsoft 365, according to the Governance of Artificial Intelligence at the ATO audit report [pdf].
Yang said the deliberate omission stemmed from the strategy’s original endorsement by the ATO in October 2022 – a month before the release of ChatGPT.
“I remember at that time, we were using something called a Bard, which is a Google model for language understanding,” she said.
“If I put the word [Bard] in my strategy, then it [was] outdated in one month because generative AI [became] the new message.
“That’s why we deliberately focused on the business value instead of hard-wiring into a single technology.”
Indeed, she added that selecting the model is “the least important question to ask if you want to industrialise AI”.
“AI is an ecosystem - it's not just AI modelling.”