Telstra outage falls to procedural error

 

Analysis: Were the proper procedures in place?

Network engineers have speculated router configuration and route limiting procedures on Telstra equipment were to blame for network outages that affected millions of services yesterday.

Seen as one of Australia's worst in recent years, the 35-minute outage saw Telstra routers incorrectly redirect traffic from its own subscribers as well as those of its wholesale customers through ISP Dodo's routers and to effective dead-ends.

Millions were affected on Telstra's domestic networks as well as those operated by iiNet, Optus, the four major banks and multiple large enterprises.

NBN Co, also a peering partner on the affected router, did not respond to questions at time of writing.

Dodo CEO Larry Kestelman confirmed the outage was the result of "a hardware issue with a Cisco border router" on his company's network which attempted to change the way it routed traffic for its subscribers to the internet.

It is thought Dodo effectively "advertised the entire internet", made up of approximately 400,000 routing prefixes. These were accepted by Dodo's bandwidth wholesaler, Telstra, and propagated to all other Telstra customers accessing the internet through the telco's AS1221 peering exchange.

One observer, Michael Keating, explained that "any traffic destined to go overseas, was in fact effectively being routed back towards Dodo".

"This meant that Telstra lost the ability to communicate data overseas, the local network became saturated with data and became unstable, and anyone using Telstra for international capacity suddenly stopped working," he said.

Dodo's Kestelman yesterday said that, "in normal circumstances, this would not result in a network outage.

"However, it appears that these routes were accepted by Telstra and propagated to Telstra's downstream customers rather than Telstra simply filtering the routes.

"This caused major issues for Telstra and its customers which should have been avoided."

Though much of the blame has been laid on Dodo's network issues yesterday, network engineers told iTnews that some of the blame should be laid with Telstra for failing to filter or limit the number of routes Dodo purported to provide for access to the internet.

Engineers said that, in a best practice situation, Telstra would limit the number of prefixes Dodo advertised to the network through a configuration feature available on Cisco and other routers for more than ten years.

Bad practice

Geoff Huston, chief scientist at the Asia Pacific Network Information Centre (APNIC), said Telstra used to employ prefix filters but he could not ascertain whether they were still in use on any of the routes.

Recent practice in Telstra and other Australian carriers had been to rely on administrative processes and "trust" to implement the Border Gateway Protocol, the underlying technology that led to yesterday's issues.

"I suspect that we're not as careful as we should be with the use of routing databases - in fact Australia is pretty bad in its use of routing databases," he said.

"Certainly ten years ago, we didn't use customer-level filters. Something like Dodo couldn't have happened then."

Vocus managing director James Spenceley said such filters are still common practice among other Australian carriers but seemed not to be in use, at least in Dodo's case.

"Filtering has been absolutely mandatory best practice since the 90s," he said. "We've all invested our time and money to make sure we can do it."

A post-incident report is expected from Dodo and Telstra in coming days, likely revealing the extent to which either party could be allayed the blame for the outage and whether any changes to administrative processes would be recommended to avoid future repeats.

A Telstra spokesman confirmed "steps were taken [Thursday] to add additional protection to our core network.

"A full review of Telstra’s network protection mechanisms is being undertaken," he said.

Some of the service providers affected by yesterday's outage said they would look to diversify peering arrangements in future to minimise the impact on their respective customer bases.

Huston warned Australian providers to become more practised in routing databases in the meantime.

"Maybe we'd all be better off at protecting ourselves from these kinds of slip-ups which, let's face it, always will happen sooner or later," he said.

Update: Added updated Telstra statement and fixed minor inaccuracies.

Copyright © iTnews.com.au . All rights reserved.


Telstra outage falls to procedural error
 
 
 
 
Top Stories
CenITex to move from IT provider to broker
Documents reveal new strategy.
 
eHealth measures missing the point
Opinion: When will the PCEHR lead to patient outcomes?
 
Photos: Google Glass gets real
Coming soon to an office near you.
 
 
Sign up to receive iTnews email bulletins
   FOLLOW US...

Latest VideosSee all videos »

Bankwest builds continuous delivery capability
Bankwest builds continuous delivery capability
To automatically deploy test/dev sandboxes by mid-year.
Veterans' Affairs sets sights on modernisation
Veterans' Affairs sets sights on modernisation
Data safe with Human Services, CIO says.
Citi Australia drops platform customisations
Citi Australia drops platform customisations
Technology chief shifts focus from building to leveraging systems.
VicRoads restructures IT team
VicRoads restructures IT team
Department moves to align with industry benchmarks.
Zurich Australia extends IT team offshore
Zurich Australia extends IT team offshore
Malaysian staff served from Australian data centres.
Leigh Berrell - Utilities CIO of the Year
Leigh Berrell - Utilities CIO of the Year
Yarra Valley Water CIO Leigh Berrell accepts his Benchmark Award for Utilities CIO of the Year.
Wayne McMahon - Retail CIO of the Year
Wayne McMahon - Retail CIO of the Year
Domino's Pizza CIO Wayne McMahon accepts his Benchmark Award for Retail CIO of the Year.
Inside Perpetual's ongoing IT transformation
Inside Perpetual's ongoing IT transformation
CIO Jenny Levy discusses how outsourcing will help the firm "simplify, refocus and grow".
Managing Complexity - Defence's Daniel McCabe
Managing Complexity - Defence's Daniel McCabe
Daniel McCabe, Assistant Secretary of Australia's Department of Defence, provides the audience at the iTnews Data Centre Strategy Summit with a deep dive into the organisation's data centre consolidation program.
How Facebook designed the data centre from scratch - Marco Magarelli
How Facebook designed the data centre from scratch - Marco Magarelli
The full keynote by Facebook data centre architect Marco Magarelli at the Australian Data Centre Strategy Summit. Magarelli details the design considerations behind the social network's Prineville, Oregon; North Carolina and Luleå, Sweden data centres.
Modernising Legacy Data Centres - Telstra's Jon Curry
Modernising Legacy Data Centres - Telstra's Jon Curry
Telstra general manager of managed data centres Jon Curry guides the audience at the iTnews Australian Data Centre Summit through the build of the telco's Clayton, Victoria data centre.
NSW Government launches NABERS data centre rating tools
NSW Government launches NABERS data centre rating tools
Matthew Clark from the NSW Department of Environment guides facilties managers through the details of the new NABERS data centre energy rating tool at the Australian Data Centre Strategy Summit.
NABERS launch panel: Australian Data Centre Strategy Summit
NABERS launch panel: Australian Data Centre Strategy Summit
Matthew Clark (NSW Dept of Environment), Greg Boorer (Canberra Data Centres), Glenn Allan (National Australia Bank), Mike Andrea (Strategic Directions) and Bob Sharon (Green Global Consulting) discuss the impact of the NABERS data centre rating.
Judges notes: Fortescue Metals [The Benchmark Awards]
Judges notes: Fortescue Metals [The Benchmark Awards]
iTnews' panel of judges discuss Fortescue Metals 'New World of Work" project, one of three shortlisted finalists for the Industrials category of the CIO Benchmark Awards.
Judges notes: Retail [The Benchmark Awards]
Judges notes: Retail [The Benchmark Awards]
iTnews' panel of judges discuss the shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: Pacific Aluminium [The Benchmark Awards]
Judges notes: Pacific Aluminium [The Benchmark Awards]
iTnews' panel of judges discuss Pacific Aluminium's lightning fast service desk refresh, one of three shortlisted finalists for the Industrials category of the CIO Benchmark Awards.
Judges notes: Domino's Pizza [The Benchmark Awards]
Judges notes: Domino's Pizza [The Benchmark Awards]
iTnews' panel of judges discuss Domino's Pizza's shift to hosted services, one of three shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: McDonald's Australia [The Benchmark Awards]
Judges notes: McDonald's Australia [The Benchmark Awards]
iTnews' panel of judges discuss McDonald's Australia's new self-service portal for employees, one of three shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: ING Direct [The Benchmark Awards]
Judges notes: ING Direct [The Benchmark Awards]
iTnews' panel of judges discuss ING Direct's 'Bank in a Box', one of three shortlisted finalists for the banking and finance category of the CIO Benchmark Awards.
Judges notes: Yarra Valley Water [The Benchmark Awards]
Judges notes: Yarra Valley Water [The Benchmark Awards]
iTnews' panel of judges discuss Yarra Valley Water's insourcing project, one of three shortlisted finalists for the Utilities category of the CIO Benchmark Awards.
Latest Comments
Polls
Do you prefer the Coalition's NBN policy?

   |   View results
Yes
  19%
 
No
  81%
TOTAL VOTES: 1685

Vote