iTnews

AWS Sydney outage prompts architecture rethink

By Allie Coyne on Jun 6, 2016 12:33PM
AWS Sydney outage prompts architecture rethink

Customers consider multi-region redundancy.

Last night's outage to an Amazon Web Services Sydney availability zone is prompting some of AWS' biggest local customers to reconsider their architectures to mitigate future damaging downtime.

AWS has built its brand on reliability as well as flexibility and cost, but yesterday's storms in Sydney showed that even the public cloud powerhouse isn't immune to nature.

Big-name web properties spent Sunday night scrambling after the bad weather fried hardware in one of Amazon's Sydney data centres, sending EC2 and EBS instances in one of its availability zones offline and creating problems for other AWS services including Elastic Search and internal DNS.

API call failures in the affected availability zone also meant that those hosted there were unable to failover elsewhere, despite having multi-zone redundancy in place for such events.

However, some fared better than others.

The likes of Carsales, Domain, The Iconic, Domino's and REA Group were among a laundry list of major players affected by the outage. (AWS' popularity in Australia forced the company to build two of its own data centres in the city after it outgrew co-lo space just 18 months following its local launch).

Domain, The Iconic and Domino's experienced extended downtime, where Carsales and REA Group had minimal impact.

Carsales' use of its own native APIs rather AWS' offering, and the fact that it hosts parts of its site in Azure, meant the company escaped with a slower, but still fully functional, site for a small amount of time.

The trade-off for this more architecturally-tricky arrangement is slightly more cost and planning ahead of time, Carsales CIO Ajay Bhatia said.

"The only sure way not to have an outage is not to be online, but second best is to have a balanced plan with a bit of luck," Bhatia said.

"One thing about Carsales is with our model, for example, with dealers where we only charge them when consumers send leads so any outage means we can't charge, so it is super important that we minimise outages."

REA Group has both multi-zone and multi-region failover in place. It deploys to two availability zones simultaneously, so wasn't impacted by the API difficulties.

It was able to get away with just one lost web page that was hosted in a single availability zone and a wobbly Android app because the IT team reloaded immediately onto another zone, and controlled its elastic load balancing to stop its sites going back to the struggling data centre.

"Multi AZ and ultimately, multi-region, with some smart architecture for deployment is key to cloud resilience today - [as is] having a team of world-class engineers manage the impacts in real time," REA CIO Nigel Dalton said.

"We learned a lot. Power failure is a tough event for anyone to suffer, and we have an A-team of engineers. Others will be learning different, tougher lessons about good AZ management."

Going global?

The impacted enterprises iTnews spoke to said they were now looking at how to shore up their infrastructure against another similarly damaging outage.

But the events of last night don't appear to have deterred them from jumping in bed with a single cloud vendor - rather, they're now looking at redundancy across geographic regions.

"There are more lessons for us. Hopefully [this] will make us better from here like I am sure [it is] with many companies. That is the benefit of such outages - it makes you think you what you could do better," Bhatia said.

"... multi region is more important than it was a day ago. I am careful not to make a decision yet though without looking into the full picture that the team must provide now."

Domain CTO Mark Cohen said it was "very very likely" last night's problems would change how his team structured its use of AWS.

"We have a post mortem today and we'll be looking at a couple of plans of attack."

Domain is heavily embedded with AWS, making a multi-cloud environment somewhat difficult. Cohen expects the IT shop will move to a multi-region architecture that makes more use of tools like Chaos Monkey.

Cloud specialist Jeff Waugh said multi-region would be a much more attractive proposition for many organisations than using several vendors.

"You could go multi-region or multi-cloud, but I think multi-region makes a lot more sense - it's a much easier proposition if you're using the same technology stack and then if something terrible happens to all of Sydney you can failover to Singapore," he said.

"Hopefully this will make people a bit more introspective about how they structure their architecture."

Got a news tip for our journalists? Share it with us anonymously here.
Copyright © iTnews.com.au . All rights reserved.
Tags:
awscloudnetworking

Partner Content

5 essential digital transformation ideas
Promoted Content 5 essential digital transformation ideas
Don't miss Australia’s premiere IoT Conference on 9th June
Promoted Content Don't miss Australia’s premiere IoT Conference on 9th June
Top 5 Benefits of Managed IT Services
Promoted Content Top 5 Benefits of Managed IT Services
Alienated from your own data? You’re not alone
Promoted Content Alienated from your own data? You’re not alone

Sponsored Whitepapers

Planning before the breach: You can’t protect what you can’t see
Planning before the breach: You can’t protect what you can’t see
Beyond FTP: Securing and Managing File Transfers
Beyond FTP: Securing and Managing File Transfers
NextGen Security Operations: A Roadmap for the Future
NextGen Security Operations: A Roadmap for the Future
Video: Watch Juniper talk about its Aston Martin partnership
Video: Watch Juniper talk about its Aston Martin partnership
Don’t pay the ransom: A three-step guide to ransomware protection
Don’t pay the ransom: A three-step guide to ransomware protection

Events

  • iTnews Benchmark Awards 2022 - Finalist Showcase
  • 11th Annual Fraud Prevention Summit 2022
  • IoT Impact Conference
  • Cyber Security for Government Summit
By Allie Coyne
Jun 6 2016
12:33PM
0 Comments

Related Articles

  • eHealth NSW is using AI to detect sepsis in hospital admissions
  • Macquarie moves SAP core banking to cloud
  • CBA eyes Amazon Braket for quantum computing ambitions
  • CBA to move more SAP workloads onto AWS
Share on Twitter Share on Facebook Share on LinkedIn Share on Whatsapp Email A Friend

Most Read Articles

NSW digital driver's licences 'easily forgeable'

NSW digital driver's licences 'easily forgeable'

Kmart Australia re-platforms ecommerce site to AWS

Kmart Australia re-platforms ecommerce site to AWS

NBN Co's 250Mbps and gigabit growth is finally clear

NBN Co's 250Mbps and gigabit growth is finally clear

NBN Co sizes up six-figure customer exodus a year to fixed wireless

NBN Co sizes up six-figure customer exodus a year to fixed wireless

Digital Nation

The other ‘CTO’: The emerging role of the chief transformation officer
The other ‘CTO’: The emerging role of the chief transformation officer
Case Study: PlayHQ leverages graph technologies for sports administration
Case Study: PlayHQ leverages graph technologies for sports administration
As NFTs gain traction, businesses start taking early bets
As NFTs gain traction, businesses start taking early bets
Metaverse hype will transition into new business models by mid decade: Gartner
Metaverse hype will transition into new business models by mid decade: Gartner
COVER STORY: From cost control to customer fanatics, AI is transforming the contact centre
COVER STORY: From cost control to customer fanatics, AI is transforming the contact centre
All rights reserved. This material may not be published, broadcast, rewritten or redistributed in any form without prior authorisation.
Your use of this website constitutes acceptance of nextmedia's Privacy Policy and Terms & Conditions.