iTnews
  • Home
  • News
  • Technology
  • Telco/ISP

Inside Telstra's network woes: what happened

By Allie Coyne on Apr 4, 2016 12:51PM
Inside Telstra's network woes: what happened

Execs detail cause of outages in bid to reclaim customer trust.

Telstra executives have opened up about the causes of three major outages that hit Telstra users in the past two months in an effort to regain the trust of the telco's vast customer base.

Australia's largest telco was forced to offer two 'free data' days to compensate users for three significant outages since the start of the year.

Yesterday's free data day saw customers slam the network to download a record 2686 TB in the 24 hour period, a 46 percent increase compared to the amount of data consumed during the February free data day.

Speaking to the CommsDay Summit today, and later to a media briefing at Telstra HQ, the telco's chief operations officer Kate McKenzie said the company was making every effort to ensure such incidents do not occur again.

She said the telco's initial review into the individual matters found the outages were not related, although two resulted from problems processing the mass registration of mobile devices.

MzKenzie stressed that at no time did the Telstra network suffer a system-wide failure.

"[However] each of these events did impact varying numbers of our customers, and we are working to ensure this does not happen again," she said.

Sign up to the iTnews newsletter to keep on top of breaking news

What went wong?

On the morning of 9th February, a fault with one of the signalling nodes used to manage Telstra's 3G and 4G data sessions and voice calls on its mobile network started acting up, McKenzie said - the disruption at the time attributed to an "embarrassing" human error. 

"With evidence of increasing degradation of the health of the node and potential service risk, a decision was taken to isolate the node from the network - a standard operating procedure for such an event," McKenzie said.

The node was removed from the network at around 12:30pm, about an hour and a half after the problems were first spotted, but further problems soon arose.

"Due to processes not being followed properly, the subsequent node restart initiated incorrectly," MzKenzie said.

"This meant that 15 percent of all mobile devices connected through this node needed to re-register when establishing a new voice call or data session."

The mass re-registration of affected mobile devices overloaded other mobile signalling nodes, meaning customers were unable to make new voice calls or access data.

The telco decided to prioritise voice services over data services to get customers back online as quickly as possible, McKenzie said, claiming most affected data services were restored by 1pm. All services were restored at around 2:30pm, she said.

The employee responsible for the outage is still working for the company, McKenzie said.

"We're not into victimising people. We understand in the heat of the moment, the right decisions aren't always made," she said.

The next big outage - on March 17 - saw customers unable to make 2G, 3G and 4G voice calls or access data from around 6pm. Around 50 percent of Telstra users were affected.

The issue occured when a significant number of international roaming customers were unexpectedly disconnected from the Telstra network. Domestic customers then followed.

The initial trigger was an international cable fault that caused parts of Telstra's signalling to disconnect and caused the readvertising of many IP addresses, network chief Mike Wright said.

MzKenzie said automatically-set efforts to reconnect all the affected users at the same time overloaded the database used to register devices, as with the previous outage.

"[We] limited the volume of 4G signalling for the devices reconnecting to the network, and configuration changes were made in the mobile network to speed up recovery. These changes reinstated network stability," she said.

Just five days later on March 22, a number of Telstra mobile, IP telephony and NBN voice customers were unable to make or receive voice calls for several hours in the morning, particularly around Victoria and Tasmania.

MzKenzie said the incident only affected around 3 percent of customers.

The issue stemmed from a card failure in a media gateway in Victoria, preventing certain calls from getting through.

Review

Telstra has commenced a wide-ranging review of its network, led by McKenzie and utilising the help of "external experts" from around the world, as well network partners Cisco, Ericsson and Juniper.

"We have already progressed short to medium term actions to improve resilience and robustness in the network," McKenzie said.

"Changes have been implemented to increase the capacity and path diversity of critical signalling channels, and a temporary layer of traffic management protection has been added to minimise the impact of events like that we saw in March and February."

Within the next few days the telco will augment capacity in its home location register - which manages customer subscription data - by adding a blade processor to help minimise the impact of mass re-registration of devices.

McKenzie pointed to the record 2686 TB of data that was consumed in its free data day on Sunday to indicate that Telstra's network was capable of handling strain.

"It's the re-registration of devices that is the problem," she said.

Immediate findings from the review will see the telco introduce configuration changes and new rules for interfacing, among other short and longer term changes.

Got a news tip for our journalists? Share it with us anonymously here.
Copyright © iTnews.com.au . All rights reserved.
Tags:
networkoutagetelcotelco/isptelstra

Partner Content

The Great Resignation has intensified insider security threats
Promoted Content The Great Resignation has intensified insider security threats
Security "mindset shift" needed to protect organisations
Promoted Content Security "mindset shift" needed to protect organisations
Why Genworth Australia embraced low-code software development
Promoted Content Why Genworth Australia embraced low-code software development
How to turn digital complexity into competitive advantage
Promoted Content How to turn digital complexity into competitive advantage

Sponsored Whitepapers

Extracting the value of data using Unified Observability
Extracting the value of data using Unified Observability
Planning before the breach: You can’t protect what you can’t see
Planning before the breach: You can’t protect what you can’t see
Beyond FTP: Securing and Managing File Transfers
Beyond FTP: Securing and Managing File Transfers
NextGen Security Operations: A Roadmap for the Future
NextGen Security Operations: A Roadmap for the Future
Video: Watch Juniper talk about its Aston Martin partnership
Video: Watch Juniper talk about its Aston Martin partnership

Events

  • Micro Focus Information Management & Governance (IM&G) Forum 2022
  • CRN Channel Meets: CyberSecurity Live Event
  • IoT Insights: Secure By Design for manufacturing
  • Cyber Security for Government Summit
By Allie Coyne
Apr 4 2016
12:51PM
0 Comments

Related Articles

  • Optus brands Telstra-TPG deal 'uniquely one-sided'
  • NBN Co, Telstra, Optus networks impacted by severe floods
  • TPG, Telstra to share mobile network and spectrum for decades
  • HyperOne 'knew' Telstra would try to overbuild its national fibre network
Share on Twitter Share on Facebook Share on LinkedIn Share on Whatsapp Email A Friend

Most Read Articles

Qantas calls time on IBM, Fujitsu in tech modernisation

Qantas calls time on IBM, Fujitsu in tech modernisation

Service NSW hits digital services goal two years early

Service NSW hits digital services goal two years early

NBN Co taking orders for 'non-premises' connections

NBN Co taking orders for 'non-premises' connections

NSW Police scores $100m to connect body-cams to firearms, tasers

NSW Police scores $100m to connect body-cams to firearms, tasers

Digital Nation

Crypto experts optimistic about future of Bitcoin: Block
Crypto experts optimistic about future of Bitcoin: Block
COVER STORY: Operationalising net zero through the power of IoT
COVER STORY: Operationalising net zero through the power of IoT
Integrity, ethics and board decisions in the digital age
Integrity, ethics and board decisions in the digital age
IBM global chief data officer on the rise of the number crunchers
IBM global chief data officer on the rise of the number crunchers
The security threat of quantum computing
The security threat of quantum computing
All rights reserved. This material may not be published, broadcast, rewritten or redistributed in any form without prior authorisation.
Your use of this website constitutes acceptance of nextmedia's Privacy Policy and Terms & Conditions.