A surge in users claiming tax refunds overwhelmed a pair of routers in July last year, crippling ATO Online and the myGov online services portal for 16 hours.
The outage, which affected services from both the Australian Taxation Office and Services Australia, took place on July 12 2019, creating problems for those trying to submit their tax returns early.
Multiple systems, including ATO Online, the business and tax agent portals and the SBR2 reporting gateway, were either forced offline or severely degraded for more than 16 hours.
At the time, Services Australia declined to answer iTnews questions on the cause of the outage.
But a summary of the outage, provided by the ATO in response to answers to questions on notice from recent budget estimates, has now shed light on the incident.
The report reveals the “likely cause [of the data centre network fault that resulted in the outage] was increased network traffic”.
This was also found to be the cause of the outage by Services Australia, though it has refused to release any details about the cause, citing security concerns.
“There was a significant increase in user demand on myGov services in July 2019, above forecasted demand predicted for the end of the financial year period,” Services Australia said in a separate answer.
One possible reason for this was the surge in early tax returns to take advantage of the government's generous 2019 tax refund.
The ATO traced the fault back to two routers supporting a legacy Teradata environment, which it was waiting until after tax time to decommission, as a “network wide outage” was required.
As a result of the July outage, however, the agency has already begun to decommission the routers, with the Teradata environment to follow between January and April 2020.
“The ATO data centre network supports numerous systems, including Teradata environments, these environments are scheduled to be refreshed in 2020,” the agency said.
“It is suspected that the increased load on the current Teradata environment from incoming and outgoing network data resulted in an increased network load and caused network segments to become unstable.
“The instability impacted how network traffic was routed and resulted in constant route changes.
“This caused data packets to not be received correctly and resulted in several ATO systems experiencing intermittent network outages.”
With systems taking a minimum of 13 hours to restore, the ATO said the outage “highlighted the need for additional monitoring to identity impacts earlier and support faster diagnosis”.
The ATO has introduced a new monitoring system in the wake of the outage, which was ultimately resolved by rebooting the affected network devices.
The system, which was already under development when the incident occurred, is now being used to monitor “all planned features across all [network] devices”.
“This means that similar impact to the core network will be identified more quickly and monitoring plans adjusted accordingly,” the ATO said.
The agency also has plans to introduce an improved major incident management process from the end of January 2020.
It is using the outage as an opportunity to review its “current state data centre network architecture” for any residual risks.
The ATO secured $70 million to begin preparations for its planned data centre move in the April budget, as well as $151 million to improve data storage and security system resilience in the recent mid-year economic and fiscal outlook.