NAB deploys Chaos Monkey to kill servers 24/7

 

Engineers allowed full night's sleep.

The National Australia Bank has deployed the Netflix-developed 'Chaos Monkey' tool on a 24/7 basis to give its website development team some relief from needing to respond to server emergencies outside of work hours.

The application was developed by Netflix to constantly test the resiliency of its Amazon-based infrastructure, and randomly kill severs within its architecture to make sure it has the ability to compensate for the failure.

NAB migrated the public-facing areas of its nab.com.au website to the AWS public cloud in September last year.

Speaking at the Amazon Web Services Sydney summit today, the bank's head of digital and online channel services, David Broeren, said the effort was aimed as much at staff resiliency as IT resiliency.

"There are tens of billions of dollars that go through the bank every day, it is a very stressful job, so if there is anything I can do to make that job easier I will," he said.

Chaos Monkey runs directly on the nab.com.au production environment, which Broeren said is the only way to get the full effect of the tool.

"We have it going 365 days a year, 24/7. It is running now - it could be killing a server as we speak."

Joining the NAB menagerie is the 'Bees with Guns' load testing tool, which Broeren and his team use in their development environment to ensure new releases can cope with "brute force" caused by spikes in demand.

The AWS cloud alerting tool then triggers an automatic scaling out of resources available to the website to deal with the increase.

"From there it's pretty simple, you take the bees away and Amazon tethers us back to where we started."

The new tools have allowed NAB to remove the monitoring thresholds that would flash orange when servers began to struggle, and cause phones to start ringing at all hours of the day.

"Autoscale, plus Chaos Monkey, actually takes something that would tradtitionally be a high severity incident - that is the loss of a server - and turns it into a [much less worrying] information incident."

"It has allowed us to give that time back and that is the investment into a resilient workforce," he said. "We have given our people back a quality of life that they didnt have."

Copyright © iTnews.com.au . All rights reserved.


NAB deploys Chaos Monkey to kill servers 24/7
 
 
 
Top Stories
ANZ looks to life beyond the transaction
If digital disruptors think an online payments startup could rock the big four, they’ve missed the point of why people use banks, says Patrick Maes.
 
What InfoSec can learn from the insurance industry
[Blog post] Another way data breach laws could help manage risk.
 
A ten-point plan for disrupting security
[Blog post] How can you defend the perimeter when it’s in the cloud?
 
 
Sign up to receive iTnews email bulletins
   FOLLOW US...
Latest Comments
Polls
What is delaying adoption of public cloud in your organisation?







   |   View results
Lock-in concerns
  29%
 
Application integration concerns
  3%
 
Security and compliance concerns
  27%
 
Unreliable network infrastructure
  9%
 
Data sovereignty concerns
  21%
 
Lack of stakeholder support
  3%
 
Protecting on-premise IT jobs
  4%
 
Difficulty transitioning CapEx budget into OpEx
  3%
TOTAL VOTES: 1039

Vote