Amazon offers explanation to recent outages

 

Elastic load balancer bug caused flood of requests.

Failed generators and a lengthy server reboot process exacerbated outages to Amazon Web Services over the weekend.

A detailed explanation of the outage released by the cloud provider this week attempted to minimise the event — saying a mere seven percent of virtual machine instances in the Virginia availability region were affected by the incident. However, the company conceded the recovery time for those affected was extended by a bottleneck in its server reboot process.

The outage on June 29, caused primariily by a huge thunder storm passing through Virginia, saw several large internet sites including Netflix, Instagram and Reddit fail, and led to widespread criticism of Amazon’s hosting service.

While backup equipment at most of the ten data centres in Amazon’s US East-1 region kicked in as designed — allowing the facility to cope with mains grid power fluctuations caused by the storm — the generators at one site failed to provide stable voltage.

As the generators failed, battery-backed uninterruptible powers supplies took over at the data centre keeping servers running. However, a second power outage roughly half an hour late meant that the already depleted UPSs started to drop off, and servers began shutting down.

Power was restored 25 minutes after the second outage but Elastic Cloud Compute (EC2) instances and Elastic Block Store (EBS) volumes in the affected data centre were unavailable to customers.

For an hour, customers were unable to launch new EC2 instances or create EBS volumes as the control planes for these two resources keeled over during the power failure.

The Relation Database Services was also hit by a bug that meant some multi-availability zone instances did not complete failover process.

EBS volumes that were in use were hobbled when brought back online, and all activity on them paused so customers could verify data consistency.

It took several hours to recover some EBS volumes completely, a process Amazon said it was working to improve.

At the same time, Amazon’s Elastic Load Balancers (ELBs) encountered a previously unknown bug that caused a flood of requests and created a backlog in other AWS zones not affected by the storm.

Amazon said it would break load balance processing into multiple queues in future to avoid a similar incident as well as to allow faster processing of time-sensitive actions.

It also intends to develop a DNS reweighting backup system that will quickly shift all load balance traffic away from an availability zone in trouble.

Copyright © iTnews.com.au . All rights reserved.


Amazon offers explanation to recent outages
 
 
 
Top Stories
Frugality as a service: the Amazon story
Behind the scenes, Amazon Web Services is one lean machine.
 
Negotiating with the cloud email megavendors
[Blog post] Lessons from Woolworths’ mammoth migration.
 
Qld govt to move up to 149k staff onto Office 365
Australia's largest deployment, outside of the universities.
 
 
Sign up to receive iTnews email bulletins
   FOLLOW US...

Latest VideosSee all videos »

The great data centre opportunity on Australia's doorstep
The great data centre opportunity on Australia's doorstep
Scott Noteboom, CEO of LitBit speaking at The Australian Data Centre Strategy Summit 2014 in the Gold Coast, Queensland, Australia. http://bit.ly/1qpxVfV Scott Noteboom is a data centre engineer who led builds for Apple and Yahoo in the earliest days of the cloud, and who now eyes Asia as the next big opportunity. Read more: http://www.itnews.com.au/News/372482,how-do-we-serve-three-billion-new-internet-users.aspx#ixzz2yNLmMG5C
Interview: Karl Maftoum, CIO, ACMA
Interview: Karl Maftoum, CIO, ACMA
To COTS or not to COTS? iTnews asks Karl Maftoum, CIO of the ACMA, at the CIO Strategy Summit.
Susan Sly: What is the Role of the CIO?
Susan Sly: What is the Role of the CIO?
AEMO chief information officer Susan Sly calls for more collaboration among Australia's technology leaders at the CIO Strategy Summit.
Meet the 2014 Finance CIO of the Year
Meet the 2014 Finance CIO of the Year
Credit Union Australia's David Gee awarded Finance CIO of the Year at the iTnews Benchmark Awards.
Meet the 2014 Retail CIO of the Year
Meet the 2014 Retail CIO of the Year
Damon Rees named Retail CIO of the Year at the iTnews Benchmark Awards for his work at Woolworths.
Robyn Elliott named the 2014 Utilities CIO of the Year
Robyn Elliott named the 2014 Utilities CIO of the Year
Acting Foxtel CIO David Marks accepts an iTnews Benchmark Award on behalf of Robyn Elliott.
Meet the 2014 Industrial CIO of the Year
Meet the 2014 Industrial CIO of the Year
Sanjay Mehta named Industrial CIO of the Year at the iTnews Benchmark Awards for his work at ConocoPhillips.
Meet the 2014 Healthcare CIO of the Year
Meet the 2014 Healthcare CIO of the Year
Greg Wells named Healthcare CIO of the Year at the iTnews Benchmark Awards for his work at NSW Health.
Meet the 2014 Education CIO of the Year
Meet the 2014 Education CIO of the Year
William Confalonieri named Healthcare CIO of the Year at the iTnews Benchmark Awards for his work at Deakin University.
Meet the 2014 Government CIO of the Year
Meet the 2014 Government CIO of the Year
David Johnson named Government CIO of the Year at the iTnews Benchmark Awards for his work at the Queensland Police Service.
Q and A: Coalition Broadband Policy
Q and A: Coalition Broadband Policy
Malcolm Turnbull and Tony Abbott discuss the Coalition's broadband policy with the press.
AFP scalps hacker 'leader' inside Australia's IT ranks.
AFP scalps hacker 'leader' inside Australia's IT ranks.
The Australian Federal Police have arrested a Sydney-based IT security professional for hacking a government website.
NBN Petition Delivered To Turnbull's Office
NBN Petition Delivered To Turnbull's Office
UTS CIO: IT teams of the future
UTS CIO: IT teams of the future
UTS CIO Chrissy Burns talks data.
New UTS Building: the IT within
New UTS Building: the IT within
The IT behind tomorrow's universities.
iTnews' NBN Panel
iTnews' NBN Panel
Is your enterprise NBN-ready?
Introducing iTnews Labs
Introducing iTnews Labs
See a timelapse of the iTnews labs being unboxed, set up and switched on! iTnews will produce independent testing of the latest enterprise software to hit the market after installing a purpose-built test lab in Sydney. Watch the installation of two DL380p servers, two HP StoreVirtual 4330 storage arrays and two HP ProCurve 2920 switches.
The True Cost of BYOD
The True Cost of BYOD
iTnews' Brett Winterford gives attendees of the first 'Touch Tomorrow' event in Brisbane a brief look at his research into enterprise mobility. What are the use cases and how can they be quantified? What price should you expect to pay for securing mobile access to corporate applications? What's coming around the corner?
Ghost clouds
Ghost clouds
ACMA chair Chris Chapman says there is uncertainty over whether certain classes of cloud service providers are caught by regulations.
Was the Snowden leak inevitable?
Was the Snowden leak inevitable?
Privacy experts David Vaile (UNSW Cyberspace Law and Policy Centre) and Craig Scroggie (CEO, NextDC) claim they were not surprised by the Snowden leaks about the NSA's PRISM program.
Latest Comments
Polls
Which bank is most likely to suffer an RBS-style meltdown?





   |   View results
ANZ
  21%
 
Bankwest
  9%
 
CommBank
  11%
 
National Australia Bank
  17%
 
Suncorp
  24%
 
Westpac
  19%
TOTAL VOTES: 1448

Vote