NBN Co believes an outage on December 21 that knocked about 40 percent of Sky Muster users offline for about seven hours was caused by a micrometeorite colliding with one of its two satellites.
The network operator was hit by a pair of outages to parts of its Sky Muster satellite network through December and early January.
The first outage started at 8.30am on December 21 when Optus - which “effectively flies our satellites”, NBN Co’s chief development officer for regional and remote Gavin Williams said - “confirmed an off-orbit condition of our second satellite, which we call our 1B satellite.”
Maxar Technologies, which owns SSL, the builder of the Sky Muster satellites, delivered its own post-incident report which Williams said pointed to a brush with a micrometeorite.
He said the satellite is equipped with optical recognition technology “that saw some meteorite activity”, which appeared to confirm Maxar’s theory.
“It cannot be 100 percent characterised as this, but all the evidence points to a micrometeorite that impacted the satellite,” Williams told a senate estimates hearing on Tuesday night.
“What that did is effectively make the satellite’s body rotate whilst it remains in its orbit, so the satellite is no longer pointing at the appropriate spot on Earth.
“The transmission system on that satellite effectively switched off for that period.”
Williams said the problem impacted 46,500 customers out of a total user base of about 112,000, about 40 percent of all users.
This was higher than earlier estimates that one in three users were impacted.
Williams said that the satellite was able to recover and reorient with the Earth, but not before customers had their services cut for about seven hours.
The company then suffered a further issue when all the impacted customers began reconnecting to Sky Muster, as it emerged that 573 customers could not re-establish a connection at all.
A mitigation was found and applied to get the users back online from January 6, but it appears this particular issue is still being investigated.
“The issue was found to be caused by missing parameters in a configuration file for the customer premises devices - the boxes that NBN Co puts in the customer’s home,” Williams said.
“They get parameters from the network and those parameters were missing. Essentially some systems in the network had those parameters and knew that service was there, and other network elements didn’t believe that service was there.”
NBN Co said it had traced the issue to its load balancers, but that a permanent fix is still being worked on.
“There’s essentially a system in some load balancers that we believe was the root cause, and we’re now working with our partners in the United States to go down to the next level and the next level of the root cause to understand how we can make sure it doesn’t happen again,” Williams said.
“Meanwhile we’ve got some emergency patches [and] workarounds so that if we do identify this kind of thing happening again we can rapidly identify and recover services.”
The emergency patch meant NBN Co was able to avoid 537 individual truck rolls to the impacted premises to try to bring their services back online one-by-one, which was still a recovery possibility in early January.
Williams apologised to the 573 customers and said the fortnight-long outage was “unprecedented”.
“First and foremost, it’s not our finest hour,” Williams said.
“Our thoughts go out to around 600 customers who had their service impacted.
“It might sound trite to say it but service interruptions and managing networks is part and parcel of being a service provider, but this kind of outage deserves the classification of being unprecedented.
“It was a complex and - I don’t use the word lightly but an - unprecedented series of events.”