As Microsoft Exchange 2016 and 2019 Sunset, How Can Privacy-Conscious Organisations Future-Proof their Email?

How REA Group weathered the AWS cloud outage

By Allie Coyne

Jun 20 2016 12:00PM

Independent systems in multi-region architecture.

Real estate giant REA Group made it through the recent Amazon Web Services Sydney availability zone outage relatively unscathed thanks to a multi-region and multi-availability zone cloud architecture.

How REA Group weathered the AWS cloud outage

Earlier this month one of AWS' Sydney availability zones went under after bad weather triggered a failure in the company's uninterruptible power supply (UPS) setup.

The outage sent some of Australia's biggest web properties scrambling when EC2 and EBS instances in the AZ became unreachable and other services including Elastic Search, APIs and internal DNS experienced flow-on problems.

REA Group, a heavy user of AWS services, was one of those affected, but managed to get away with only a broken ad server, one offline web app, a wobbly Android application and slightly slower response times for some services.

" ... while we weren’t totally unaffected, it was overall a satisfying outcome," senior technical lead Jeremy Burton said.

Being prepared .. and lucky

While the outage has prompted many to reconsider their cloud architecture, REA Group says designing for failure - coupled with "a bit of luck" - helped it weather the storm.

REA's production systems are deployed in a multi-availability zone setup by default. Its most critical systems - as well as those like Redshift that don't offer multi-AZ options - have been architected to run across multiple regions, specifically in Frankfurt and Sydney.

The IT team operates independent copies of the systems that interact with REA's master data store in each region for eventual consistency, Burton said.

"The only thing that will be common is the source of the data," he wrote.

"In this way, if one region has problems, the other is totally unaffected."

API clients can talk cross region if local copies aren't available, Burton said, using a combination of AWS Route53 latency routing and Route53 health checks.

This approach kicked in during the recent Sydney AZ outage - "one of our services automatically flipped over to our European region when some of its instances had problems," Burton said.

Additionally, continuing to host some of its core systems within its inhouse data centre and deploying straight to S3 for static assets helped REA avoid severe downtime.

"S3 by its nature is more durable than an EC2 instance, and more likely to survive an AZ failure," Burton said.

"It’s multi-AZ by default, and while the events of the weekend have shown that just being mutli-AZ isn’t necessarily enough to be resilient to an AZ failure, the S3 service held up well."

Deep pockets required

However, be prepared to see your infrastructure costs double when adopting a multi-region approach, Burton warned.

"It takes well-architected systems to function under eventual consistency, and to be decoupled in a way that allows redundancy in appropriate parts of the infrastructure," he said.

"Making your infrastructure immutable comes at some automation cost.

"And in some cases, it’s just not worth it. Either the SLAs don’t indicate a need for multi region, or the system isn’t critical enough to justify the engineering or infrastructure expense."

Got a news tip for our journalists? Share it with us anonymously here.

Tags:

Partner Content

Partner Content AI Supercharged: How Search is Powering the Future

AI Copilot: Breaking Down Silos & Securing the Future

Partner Content Machine identity a key priority for organisations’ security strategies: CyberArk

Partner Content Ransomware targets Australian SME false sense of security

Events

Most Read Articles

ADHA readies market test of Accenture's $788m My Health Record deal

"It's an exciting time to be part of the health and aged care sector"

Insicon founder Matt Miller on the coming 'tsunami' of compliance and educating boards about cyber security

Orro claims Australia first with managed digital asset discovery service

As Microsoft Exchange 2016 and 2019 Sunset, How Can Privacy-Conscious Organisations Future-Proof their Email?

Microsoft to cut about four percent of jobs amid hefty AI bets

Google offers new proposal to stave off EU antitrust fine

Defence commits to five more years of Azure worth $495m

El Jannah backs Salesforce martech stack to support store expansion

How REA Group weathered the AWS cloud outage