Westpac: Quicker to reboot than press DR alarm

 

Why Westpac made the right call to switch off services.

Westpac staff voluntarily switched off ATM, EFTPOS and Online Banking services yesterday morning, iTnews can reveal, to avert a potentially far more severe outage.

The bank’s Automatic Teller Machine, EFTPOS and Online banking services were cut yesterday morning after the failure of an air conditioning unit at Westpac’s Ryde (Sydney) data centre, first noticed at 5am.

ATM and EFTPOS services were back online by 11am, but online banking wasn't available until 4:30pm.

Whilst Westpac won’t be able to provide a post-incident report until next week, a spokesman for the company today explained to iTnews why engineers made the agonising choice to switch off the services.

Upon discovering the cooling fault at 5am, IT engineers at the data centre were faced with the choice of leaving the servers and storage operating at dangerous temperatures – which could have resulted in a far more serious meltdown, executing the bank’s business continuity plan and shifting workloads to another facility, or switching the machines off until the air conditioning unit could be replaced.

The first option could have exposed Westpac to days or weeks of outages and the potential for data corruption or lost data.

The second option, switching to a secondary disaster recovery facility, was deemed to take too long.

The Westpac spokesman said engineers considered that it would take far less time to switch off the machines, wait for a third party to swap out the cooling units (the building is owned by Mirvac, IT infrastructure outsourced to IBM) and reboot.

The right call in the wrong situation?

The key question for Westpac’s board: why would its disaster recovery plan take so long to execute?

iTnews has discussed the build of ‘active-active’ data centre configurations – where ‘warm’ servers in secondary facilities can take on workloads from production systems within shorter time frames than the five plus hours Westpac took to bring EFTPOS and ATM back online or the eight hours plus to bring back online banking.

Varghese Jacob, designer of data centres for many blue-chip Australian companies, stressed that the industry "expects disaster recovery rollover times to be fast - a matter of a few minutes or hours."

"It shouldn't be quicker to shut down and reboot," he said.

Whilst Varghese can't speak for Westpac, he said often organisations don't regularly test the business continuity plans in place.

In this case, Westpac’s engineers are likely to have made the right call. But they would have good cause to turn around to the bank’s management and ask why it hadn’t put aside some of its $4 billion profits into the best business continuity money can buy.

Surely availability is secondary only to security in terms of the bank’s priorities.

Copyright © iTnews.com.au . All rights reserved.


Westpac: Quicker to reboot than press DR alarm
Time for an upgrade?
 
 
 
Top Stories
ATO shaves $4m off IT contractor panel
Reform cuts admin burden, introduces KPIs.
 
Turnbull introduces data retention legislation
Still no definition of metadata to be stored.
 
Crime Commission prepares core systems overhaul
Will replace 30 year-old national criminal database.
 
 
Time for an upgrade?
Sign up to receive iTnews email bulletins
   FOLLOW US...
Latest Comments
Polls
In which area is your IT shop hiring the most staff?




   |   View results
IT security and risk
  27%
 
Sourcing and strategy
  13%
 
IT infrastructure (servers, storage, networking)
  21%
 
End user computing (desktops, mobiles, apps)
  14%
 
Software development
  25%
TOTAL VOTES: 435

Vote
Would your InfoSec team be prepared to share threat data with the Australian Government?

   |   View results
Yes
  54%
 
No
  46%
TOTAL VOTES: 209

Vote