Amazon offers explanation to recent outages

 

Elastic load balancer bug caused flood of requests.

Failed generators and a lengthy server reboot process exacerbated outages to Amazon Web Services over the weekend.

A detailed explanation of the outage released by the cloud provider this week attempted to minimise the event — saying a mere seven percent of virtual machine instances in the Virginia availability region were affected by the incident. However, the company conceded the recovery time for those affected was extended by a bottleneck in its server reboot process.

The outage on June 29, caused primariily by a huge thunder storm passing through Virginia, saw several large internet sites including Netflix, Instagram and Reddit fail, and led to widespread criticism of Amazon’s hosting service.

While backup equipment at most of the ten data centres in Amazon’s US East-1 region kicked in as designed — allowing the facility to cope with mains grid power fluctuations caused by the storm — the generators at one site failed to provide stable voltage.

As the generators failed, battery-backed uninterruptible powers supplies took over at the data centre keeping servers running. However, a second power outage roughly half an hour late meant that the already depleted UPSs started to drop off, and servers began shutting down.

Power was restored 25 minutes after the second outage but Elastic Cloud Compute (EC2) instances and Elastic Block Store (EBS) volumes in the affected data centre were unavailable to customers.

For an hour, customers were unable to launch new EC2 instances or create EBS volumes as the control planes for these two resources keeled over during the power failure.

The Relation Database Services was also hit by a bug that meant some multi-availability zone instances did not complete failover process.

EBS volumes that were in use were hobbled when brought back online, and all activity on them paused so customers could verify data consistency.

It took several hours to recover some EBS volumes completely, a process Amazon said it was working to improve.

At the same time, Amazon’s Elastic Load Balancers (ELBs) encountered a previously unknown bug that caused a flood of requests and created a backlog in other AWS zones not affected by the storm.

Amazon said it would break load balance processing into multiple queues in future to avoid a similar incident as well as to allow faster processing of time-sensitive actions.

It also intends to develop a DNS reweighting backup system that will quickly shift all load balance traffic away from an availability zone in trouble.

Copyright © iTnews.com.au . All rights reserved.


Amazon offers explanation to recent outages
 
 
 
 
Top Stories
eHealth measures missing the point
Opinion: When will the PCEHR lead to patient outcomes?
 
Photos: Google Glass gets real
Coming soon to an office near you.
 
Photos: HTC One vs Samsung Galaxy S4
Android giants battle it out.
 
 
Sign up to receive iTnews email bulletins
   FOLLOW US...

Latest VideosSee all videos »

Bankwest builds continuous delivery capability
Bankwest builds continuous delivery capability
To automatically deploy test/dev sandboxes by mid-year.
Veterans' Affairs sets sights on modernisation
Veterans' Affairs sets sights on modernisation
Data safe with Human Services, CIO says.
Citi Australia drops platform customisations
Citi Australia drops platform customisations
Technology chief shifts focus from building to leveraging systems.
VicRoads restructures IT team
VicRoads restructures IT team
Department moves to align with industry benchmarks.
Zurich Australia extends IT team offshore
Zurich Australia extends IT team offshore
Malaysian staff served from Australian data centres.
Leigh Berrell - Utilities CIO of the Year
Leigh Berrell - Utilities CIO of the Year
Yarra Valley Water CIO Leigh Berrell accepts his Benchmark Award for Utilities CIO of the Year.
Wayne McMahon - Retail CIO of the Year
Wayne McMahon - Retail CIO of the Year
Domino's Pizza CIO Wayne McMahon accepts his Benchmark Award for Retail CIO of the Year.
Inside Perpetual's ongoing IT transformation
Inside Perpetual's ongoing IT transformation
CIO Jenny Levy discusses how outsourcing will help the firm "simplify, refocus and grow".
Managing Complexity - Defence's Daniel McCabe
Managing Complexity - Defence's Daniel McCabe
Daniel McCabe, Assistant Secretary of Australia's Department of Defence, provides the audience at the iTnews Data Centre Strategy Summit with a deep dive into the organisation's data centre consolidation program.
How Facebook designed the data centre from scratch - Marco Magarelli
How Facebook designed the data centre from scratch - Marco Magarelli
The full keynote by Facebook data centre architect Marco Magarelli at the Australian Data Centre Strategy Summit. Magarelli details the design considerations behind the social network's Prineville, Oregon; North Carolina and Luleå, Sweden data centres.
Modernising Legacy Data Centres - Telstra's Jon Curry
Modernising Legacy Data Centres - Telstra's Jon Curry
Telstra general manager of managed data centres Jon Curry guides the audience at the iTnews Australian Data Centre Summit through the build of the telco's Clayton, Victoria data centre.
NSW Government launches NABERS data centre rating tools
NSW Government launches NABERS data centre rating tools
Matthew Clark from the NSW Department of Environment guides facilties managers through the details of the new NABERS data centre energy rating tool at the Australian Data Centre Strategy Summit.
NABERS launch panel: Australian Data Centre Strategy Summit
NABERS launch panel: Australian Data Centre Strategy Summit
Matthew Clark (NSW Dept of Environment), Greg Boorer (Canberra Data Centres), Glenn Allan (National Australia Bank), Mike Andrea (Strategic Directions) and Bob Sharon (Green Global Consulting) discuss the impact of the NABERS data centre rating.
Judges notes: Fortescue Metals [The Benchmark Awards]
Judges notes: Fortescue Metals [The Benchmark Awards]
iTnews' panel of judges discuss Fortescue Metals 'New World of Work" project, one of three shortlisted finalists for the Industrials category of the CIO Benchmark Awards.
Judges notes: Retail [The Benchmark Awards]
Judges notes: Retail [The Benchmark Awards]
iTnews' panel of judges discuss the shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: Pacific Aluminium [The Benchmark Awards]
Judges notes: Pacific Aluminium [The Benchmark Awards]
iTnews' panel of judges discuss Pacific Aluminium's lightning fast service desk refresh, one of three shortlisted finalists for the Industrials category of the CIO Benchmark Awards.
Judges notes: Domino's Pizza [The Benchmark Awards]
Judges notes: Domino's Pizza [The Benchmark Awards]
iTnews' panel of judges discuss Domino's Pizza's shift to hosted services, one of three shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: McDonald's Australia [The Benchmark Awards]
Judges notes: McDonald's Australia [The Benchmark Awards]
iTnews' panel of judges discuss McDonald's Australia's new self-service portal for employees, one of three shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: ING Direct [The Benchmark Awards]
Judges notes: ING Direct [The Benchmark Awards]
iTnews' panel of judges discuss ING Direct's 'Bank in a Box', one of three shortlisted finalists for the banking and finance category of the CIO Benchmark Awards.
Judges notes: Yarra Valley Water [The Benchmark Awards]
Judges notes: Yarra Valley Water [The Benchmark Awards]
iTnews' panel of judges discuss Yarra Valley Water's insourcing project, one of three shortlisted finalists for the Utilities category of the CIO Benchmark Awards.
Latest Comments
Polls
Do you prefer the Coalition's NBN policy?

   |   View results
Yes
  19%
 
No
  81%
TOTAL VOTES: 1669

Vote