Tabcorp has revealed its November 2020 outage was caused by a data centre air conditioner malfunction that triggered the room’s fire suppression system, the noise of which damaged its IT equipment.
The weekend outage impacted TAB, Keno and gaming services operations, and caused key race meetings to be either delayed or postponed.
It was initially attributed to a “likely fire incident” in Tabcorp’s rented space in the Global Switch data centre in inner Sydney, a description that the data centre operator took issue with.
The issue was then attributed to a mechanical plant "malfunction" at the data centre, which tripped fire suppression systems in a single data hall and damaged some of Tabcorp's server infrastructure.
Tabcorp said it had now completed a “comprehensive review” of the incident, which revealed the chain of events that knocked its systems offline.
News of the review, which is understood to have been produced by Deloitte, was first published by The Sydney Morning Herald.
“A comprehensive review found a faulty bearing in the data centre air conditioner unit caused friction, heat, smoke and the activation of fire suppression gas, generating significant noise,” Tabcorp said.
“This was an unprecedented event that instantly and catastrophically damaged highly sensitive Tabcorp hardware.
“This occurred in the part of the third party-owned data centre that Tabcorp uses. Prior to the incident, there was no indication that any mechanical failure was imminent.”
The air conditioning unit is believed to have been Global Switch's rather than anything installed or operated by Tabcorp.
Global Switch uses the Inergen gas suppression system, according to its technical specifications [pdf].
There have been data centre incidents in the past where the pressure of the gas release caused vibrations so severe that it damaged the racked equipment.
These include a planned test of the suppression system at an ING Bank facility in 2016 where the noise of the gas release damaged all the disk storage in the room.
A similar ‘shockwave’ incident was attributed an outage experienced by Glasgow City Council in 2015.
In its own post-mortem, Tabcorp appeared to indicate the air conditioning unit that tripped the suppression system was not overworked.
“This was not the result of excessive customer load,” it said.
Tabcorp said it had taken steps at the data centre “to address the specific incident including upgrades to hardware.”
“Further steps are underway to uplift disaster recovery capability [at the facility],” it said.
Questions have also been lodged with Global Switch on what lessons they took from the incident, as well as any actions they took to compensate Tabcorp.
Tabcorp’s wagering and media managing director Adam Rytenskild said in a statement that “it was very unfortunate the outage occurred and especially on such a big day for our customers, for TAB and the racing industry."
“We have responded and are taking steps to minimise the chance of anything like that happening again," he said.