NAB deploys Chaos Monkey to kill servers 24/7


Engineers allowed full night's sleep.

The National Australia Bank has deployed the Netflix-developed 'Chaos Monkey' tool on a 24/7 basis to give its website development team some relief from needing to respond to server emergencies outside of work hours.

The application was developed by Netflix to constantly test the resiliency of its Amazon-based infrastructure, and randomly kill severs within its architecture to make sure it has the ability to compensate for the failure.

NAB migrated the public-facing areas of its website to the AWS public cloud in September last year.

Speaking at the Amazon Web Services Sydney summit today, the bank's head of digital and online channel services, David Broeren, said the effort was aimed as much at staff resiliency as IT resiliency.

"There are tens of billions of dollars that go through the bank every day, it is a very stressful job, so if there is anything I can do to make that job easier I will," he said.

Chaos Monkey runs directly on the production environment, which Broeren said is the only way to get the full effect of the tool.

"We have it going 365 days a year, 24/7. It is running now - it could be killing a server as we speak."

Joining the NAB menagerie is the 'Bees with Guns' load testing tool, which Broeren and his team use in their development environment to ensure new releases can cope with "brute force" caused by spikes in demand.

The AWS cloud alerting tool then triggers an automatic scaling out of resources available to the website to deal with the increase.

"From there it's pretty simple, you take the bees away and Amazon tethers us back to where we started."

The new tools have allowed NAB to remove the monitoring thresholds that would flash orange when servers began to struggle, and cause phones to start ringing at all hours of the day.

"Autoscale, plus Chaos Monkey, actually takes something that would tradtitionally be a high severity incident - that is the loss of a server - and turns it into a [much less worrying] information incident."

"It has allowed us to give that time back and that is the investment into a resilient workforce," he said. "We have given our people back a quality of life that they didnt have."

Copyright © . All rights reserved.

NAB deploys Chaos Monkey to kill servers 24/7
Top Stories
Toll Group to go Google
Poaches Woolworths project manager.
How News Corp's CIO tackled skills in his race to the cloud
What to do when your team’s talents are no longer needed.
Photos: How Thodey transformed Telstra
From turbulent Trujillo to Australia's leading telco.
Sign up to receive iTnews email bulletins
Latest Comments
Who do you trust most to protect your private data?

   |   View results
Your bank
Your insurance company
A technology company (Google, Facebook et al)
Your telco, ISP or utility
A retailer (Coles, Woolworths et al)
A Federal Government agency (ATO, Centrelink etc)
An Australian law enforcement agency (AFP, ASIO et al)
A State Government agency (Health dept, etc)

Do you support the abolition of the Office of the Information Commissioner?

   |   View results
I support shutting down the OAIC.
I DON'T support shutting the OAIC.