Growing pains: Amazon EC2 suffers huge outage

Powered by SC Magazine
 
Page 1 of 2 | Single page

Described as the "worst outage in cloud computing history".

A large-scale outage affecting Amazon Web Services' Elastic Compute Cloud (EC2) over the Easter break has highlighted one of the many risks associated with running thousands of applications from large clusters of virtualised servers.

The outage - which began Thursday last week and still impacted a number of Amazon customers as late as Tuesday - took out popular social networking services Foursquare, FormSpring, Heroku, HootSuite, Quora and Reddit.

It also impacted many IT service providers that use Amazon as part of their total solution to end users, such as cloud management software vendor Right Scale.

The outage began Thursday at 6pm (Sydney time), when customers hosting applications at Amazon's cloud compute service (EC2) based at the US-EAST-1 data centre (in Virginia) began experiencing connectivity, latency and error rates.

On any normal day Amazon’s EBS - a giant storage area network - dynamically distributes small volumes of storage capacity to thousands of the physical servers hosting Amazon EC2 virtual server instances and to applications using Amazon’s Relational Database Service (RDS).

According to Amazon's status updates, a mysterious network event caused software monitoring of the EC2 network to incorrectly calculate that there was insufficient redundancy available to meet the needs of these server and database instances.

The software automatically attempted to move resources around the network to adjust – in effect a mass re-mirroring of storage that flooded the network with traffic.

In the chaos that ensued, Amazon ran out of capacity in its US-EAST-1 Availability Zone – with services failing faster than Amazon engineers could re-provision them.

Six hours after the unexplained network event, Amazon's technicians reported to customers that EBS-backed instances in the US-EAST-1 region were "failing at a high rate."

"Effectively, [Amazon's high availability software] launched a Denial-of-Service attack on their own infrastructure," explained Matt Moor, technical architect at a Sydney-based Amazon customer Bulletproof Networks, which fortunately relies primarily on providing services from its own infrastructure in Australia.

The outage initially affected multiple Amazon ‘availability zones’ (data centres) but by Friday morning, the company reported that most server instances across its compute cloud were again operational - with the exception of any applications hosted in the US-EAST-1 Availability Zone.

Customers with applications hosted in this zone suffered outages well into the weekend as Amazon engineers struggled to bring the large volume of services back online.

"The work we're doing to enable customers to be able to launch EBS backed instances and create, delete, attach and detach EBS volumes in the affected Availability Zone is taking considerably more time than we anticipated," Amazon's engineers reported on the company's status page.

Instances had to be re-provisioned slowly, the company reported, "in order to moderate the load on the control plane and prevent it from becoming overloaded and affecting other functions."

By Monday, Amazon reported that most instances were operational, advising those customers still experiencing issues to "stop and restart your instance in order to restore connectivity."

How did customers rate Amazon's response to the outage? Read on...

Copyright © iTnews.com.au . All rights reserved.


Growing pains: Amazon EC2 suffers huge outage
 
 
 
Top Stories
Frugality as a service: the Amazon story
Behind the scenes, Amazon Web Services is one lean machine.
 
Negotiating with the cloud email megavendors
[Blog post] Lessons from Woolworths’ mammoth migration.
 
Qld govt to move up to 149k staff onto Office 365
Australia's largest deployment, outside of the universities.
 
 
Sign up to receive iTnews email bulletins
   FOLLOW US...

Latest VideosSee all videos »

The great data centre opportunity on Australia's doorstep
The great data centre opportunity on Australia's doorstep
Scott Noteboom, CEO of LitBit speaking at The Australian Data Centre Strategy Summit 2014 in the Gold Coast, Queensland, Australia. http://bit.ly/1qpxVfV Scott Noteboom is a data centre engineer who led builds for Apple and Yahoo in the earliest days of the cloud, and who now eyes Asia as the next big opportunity. Read more: http://www.itnews.com.au/News/372482,how-do-we-serve-three-billion-new-internet-users.aspx#ixzz2yNLmMG5C
Interview: Karl Maftoum, CIO, ACMA
Interview: Karl Maftoum, CIO, ACMA
To COTS or not to COTS? iTnews asks Karl Maftoum, CIO of the ACMA, at the CIO Strategy Summit.
Susan Sly: What is the Role of the CIO?
Susan Sly: What is the Role of the CIO?
AEMO chief information officer Susan Sly calls for more collaboration among Australia's technology leaders at the CIO Strategy Summit.
Meet the 2014 Finance CIO of the Year
Meet the 2014 Finance CIO of the Year
Credit Union Australia's David Gee awarded Finance CIO of the Year at the iTnews Benchmark Awards.
Meet the 2014 Retail CIO of the Year
Meet the 2014 Retail CIO of the Year
Damon Rees named Retail CIO of the Year at the iTnews Benchmark Awards for his work at Woolworths.
Robyn Elliott named the 2014 Utilities CIO of the Year
Robyn Elliott named the 2014 Utilities CIO of the Year
Acting Foxtel CIO David Marks accepts an iTnews Benchmark Award on behalf of Robyn Elliott.
Meet the 2014 Industrial CIO of the Year
Meet the 2014 Industrial CIO of the Year
Sanjay Mehta named Industrial CIO of the Year at the iTnews Benchmark Awards for his work at ConocoPhillips.
Meet the 2014 Healthcare CIO of the Year
Meet the 2014 Healthcare CIO of the Year
Greg Wells named Healthcare CIO of the Year at the iTnews Benchmark Awards for his work at NSW Health.
Meet the 2014 Education CIO of the Year
Meet the 2014 Education CIO of the Year
William Confalonieri named Healthcare CIO of the Year at the iTnews Benchmark Awards for his work at Deakin University.
Meet the 2014 Government CIO of the Year
Meet the 2014 Government CIO of the Year
David Johnson named Government CIO of the Year at the iTnews Benchmark Awards for his work at the Queensland Police Service.
Q and A: Coalition Broadband Policy
Q and A: Coalition Broadband Policy
Malcolm Turnbull and Tony Abbott discuss the Coalition's broadband policy with the press.
AFP scalps hacker 'leader' inside Australia's IT ranks.
AFP scalps hacker 'leader' inside Australia's IT ranks.
The Australian Federal Police have arrested a Sydney-based IT security professional for hacking a government website.
NBN Petition Delivered To Turnbull's Office
NBN Petition Delivered To Turnbull's Office
UTS CIO: IT teams of the future
UTS CIO: IT teams of the future
UTS CIO Chrissy Burns talks data.
New UTS Building: the IT within
New UTS Building: the IT within
The IT behind tomorrow's universities.
iTnews' NBN Panel
iTnews' NBN Panel
Is your enterprise NBN-ready?
Introducing iTnews Labs
Introducing iTnews Labs
See a timelapse of the iTnews labs being unboxed, set up and switched on! iTnews will produce independent testing of the latest enterprise software to hit the market after installing a purpose-built test lab in Sydney. Watch the installation of two DL380p servers, two HP StoreVirtual 4330 storage arrays and two HP ProCurve 2920 switches.
The True Cost of BYOD
The True Cost of BYOD
iTnews' Brett Winterford gives attendees of the first 'Touch Tomorrow' event in Brisbane a brief look at his research into enterprise mobility. What are the use cases and how can they be quantified? What price should you expect to pay for securing mobile access to corporate applications? What's coming around the corner?
Ghost clouds
Ghost clouds
ACMA chair Chris Chapman says there is uncertainty over whether certain classes of cloud service providers are caught by regulations.
Was the Snowden leak inevitable?
Was the Snowden leak inevitable?
Privacy experts David Vaile (UNSW Cyberspace Law and Policy Centre) and Craig Scroggie (CEO, NextDC) claim they were not surprised by the Snowden leaks about the NSA's PRISM program.
Latest Comments
Polls
Which bank is most likely to suffer an RBS-style meltdown?





   |   View results
ANZ
  21%
 
Bankwest
  9%
 
CommBank
  11%
 
National Australia Bank
  17%
 
Suncorp
  24%
 
Westpac
  19%
TOTAL VOTES: 1455

Vote