AAPT Wholesale has pinned a faulty rectifier as the possible cause of a partial power loss at its Flinders data centre in Melbourne on Tuesday.
The outage occurred in a DC-powered area of the data centre and disrupted business data and some voice services for up to 45 minutes.
Technicians at the facility had been carrying out non-hazardous works swapping out batteries in the centre's battery banks.
"When we were changing out the batteries for some reason on one side we popped a couple of rectifiers," CEO David Yuile told iTnews.
Rectifiers are electrical devices that convert alternating current (AC) to direct current (DC). DC power is favoured by telcos for stability reasons.
"When that happens, what you normally do is replace [the rectifiers] as quickly as you can," Yuile said.
"We replaced one of them, and as we were replacing it the DC [circuit] breaker tripped and we lost [power]."
Yuile said that power was cut to that part of the centre for eight minutes before it was restored.
Disruptions flowed for up to 45 minutes after the cut, at which point support calls tailed off.
He said that AAPT "ran a little bit challenged" on the provision of voice services in the eight minutes, although the telco has approximately 14 voice switches nationwide.
The company's MPLS core data network failed over to AAPT's backup data centre in Richmond. AAPT runs twin data centres for redundancy in each capital city.
"Obviously people with direct connections that were dedicated and weren't redundant were [more heavily] impacted," he said.
Customers received an AAPT fault report (AFR) late yesterday.
It was expected to point to the rectifier as the cause, although more forensic work is required by power technicians before the cause is definitively known.
"We're still doing the wash-up," Yuile said. "Because DC plant is quite specialised, we have to get the suppliers in."
One issue to be explored in the post-incident investigation is how a fault in a replacement rectifier went undiagnosed.
AAPT keeps replacement supplies of rectifiers on hand and these are tested before being put into the live network.
Yuile apologised to customers that were impacted.
"We take these things very seriously," he said.
"It's important for us [now to focus on] what we do from a design point of view or an operational point of view ... to make sure nothing like this happens again."