Westpac: Quicker to reboot than press DR alarm

 

Why Westpac made the right call to switch off services.

Westpac staff voluntarily switched off ATM, EFTPOS and Online Banking services yesterday morning, iTnews can reveal, to avert a potentially far more severe outage.

The bank’s Automatic Teller Machine, EFTPOS and Online banking services were cut yesterday morning after the failure of an air conditioning unit at Westpac’s Ryde (Sydney) data centre, first noticed at 5am.

ATM and EFTPOS services were back online by 11am, but online banking wasn't available until 4:30pm.

Whilst Westpac won’t be able to provide a post-incident report until next week, a spokesman for the company today explained to iTnews why engineers made the agonising choice to switch off the services.

Upon discovering the cooling fault at 5am, IT engineers at the data centre were faced with the choice of leaving the servers and storage operating at dangerous temperatures – which could have resulted in a far more serious meltdown, executing the bank’s business continuity plan and shifting workloads to another facility, or switching the machines off until the air conditioning unit could be replaced.

The first option could have exposed Westpac to days or weeks of outages and the potential for data corruption or lost data.

The second option, switching to a secondary disaster recovery facility, was deemed to take too long.

The Westpac spokesman said engineers considered that it would take far less time to switch off the machines, wait for a third party to swap out the cooling units (the building is owned by Mirvac, IT infrastructure outsourced to IBM) and reboot.

The right call in the wrong situation?

The key question for Westpac’s board: why would its disaster recovery plan take so long to execute?

iTnews has discussed the build of ‘active-active’ data centre configurations – where ‘warm’ servers in secondary facilities can take on workloads from production systems within shorter time frames than the five plus hours Westpac took to bring EFTPOS and ATM back online or the eight hours plus to bring back online banking.

Varghese Jacob, designer of data centres for many blue-chip Australian companies, stressed that the industry "expects disaster recovery rollover times to be fast - a matter of a few minutes or hours."

"It shouldn't be quicker to shut down and reboot," he said.

Whilst Varghese can't speak for Westpac, he said often organisations don't regularly test the business continuity plans in place.

In this case, Westpac’s engineers are likely to have made the right call. But they would have good cause to turn around to the bank’s management and ask why it hadn’t put aside some of its $4 billion profits into the best business continuity money can buy.

Surely availability is secondary only to security in terms of the bank’s priorities.

Copyright © iTnews.com.au . All rights reserved.


Westpac: Quicker to reboot than press DR alarm
Time for an upgrade?
"Poor management, so many businesses don't have effective DR plans. Technology today has some excellent products no matter what vendor to allow it and if your vendor cant them you need to be moving ..."
By pameacs
 
 
 
Comments: 8
daver
May 6, 2011 7:40 AM
The second last paragraph is the punch line. You only get robust enough and "fast" DR infrastructure with the right amount of time, technology and ultimately money put aside for it. Oh and how many CRAC units do they have or should they have and type? Let's not forget UPS quantity and capacity!
Danielrollston
May 6, 2011 10:02 AM
With the right control system on things like chillers remote monitoring of key paramaters as part of a vendor maintenance plan could have forseen any upcoming issues before they happenned... I wonder what control system and other equipment they had in there?
RaTTyRaTT
May 6, 2011 10:32 AM
Actually, I noticed the ATM network up by 11am, but the internet banking was back online by 1pm. I paid some bills then :-)

Still, I'm very happy with the response from Westpac, crappy situation - but good handling. If people can handle a short outage like that - then the world is not going to hell as fast as I thought it was. (living in a gimmie, gimmie society...LOL!!!)

Reading between the lines, I would surmise that the 'failure of an AC' unit was probably not the single cause of this - but what is publically being released. My guess is systematic failure of multiple components - to cause the kind of heat imbalance they are stating would have occurred.
I've seen such things downplayed before by others, to avoid a PR disaster, etc. Worst was when a sparky once sliced the cable (silly bugger) on the wrong side of the UPS circuit - which killed the entire DC power environment. (mind, questions were raised why no redundancy existed there... but that's Govt. for ya!)
It was downplayed to state it was a UPS failure, blamed on the installer - quietly swept under the carpet & things moved on. (New DC later - still same crap... LOL!!!)
RaTTyRaTT
May 6, 2011 10:36 AM
Ironically, heat loads by systems these days is actually higher, if you look at some of the testing that has been done by vendors regarding their equipment. The sweet spot has always been (holy grail) around 22 - 24C = however I have seen numbers around the 28 - 31C and still no impact on functionality.

Mind, the density that Westpac probably has - would push above 40C I would guess. That is also because of SAN storage mostly, blades dump a lot of heat - but nothing that can't be dissappated over time. SAN storage just runs HOT. (Still remember the wonderful warm feeling in the dead of winter, standing behind the SAN racks - greatest place to be during -7C nights & 0 - 6C day's...

:-)
Bob
May 6, 2011 2:09 PM
Westpac would have thousands of branches and ATMs and these would access the data centre by a network along with links to other financial institutions. Invoking a DR plan for a major bank would be a big call.

If you are going to pull the plugs and move them to another site that's going to take a long time, even assuming the the DR centre was ready to go. At some stage in the future you are also go to need to bring it all back to the main centre resulting in another outage.

A disaster is a more like a complete loss of the facility like an earth quake or explosion destroying it, where you are not coming back. A disaster is not someone putting the wrong milk in your latte.
laticslad
May 6, 2011 3:17 PM
In todays age, this is completely unacceptable, Facebook nor Google would be unavailable because of an air conditioner DC problem, they would have seamless switch over to other DC's. For a major bank in Australia that has just released a $4Billion profit not to have such a robust infrastructure is quite frankly unforgiveable.
umbria
May 6, 2011 3:18 PM
Brett is right - the IT staff had no other choice, and the blame falls squarely on the board for not agreeing to fund hot-swappable DR facilities for all customer-facing live services. When the failed site is available again, data already updated to the DR site is rolled forward to the offline site, and when they are mirrored again, they return to the normal, redundant operating condition. It is unforgiveable that Westpac does not have this arrangement in place. The same applies to all large companies with customer-facing facilities, but especially banks.

RattyRatt and Bob, Westpac's greed left customers all over the world standing at petrol bowsers with no cash to pay for fuel already pumped, and at airport counters unable to pay for flights. It left mothers with full shopping trolleys unable to pay. And online banking was offline for the entire business day, from 0600 to 1630, which would have left many customers without a window in their day to move money to cover direct debits.

This was a major, major failure, which parallel running data centres automatically prevent. Shame.
pameacs
May 7, 2011 8:33 AM
Poor management, so many businesses don't have effective DR plans. Technology today has some excellent products no matter what vendor to allow it and if your vendor cant them you need to be moving to one who can. I remember one of the big banks or stock traders affected directly by 9/11 was operational with partial services in there DR in a few hours and was fully operational there in a few more. It was a case study I think IBM trotted out for a while after, they may still do. Yes it can be done, no it doesn't have to be hellishly expensive is you have good architects who don't have personal allegiances to a vendor and are willing to do their homework. It does have to be tested routinely. This makes them effective.
Imagine if pilots did DR like some organisations do. Hold on we have engine number 4 out. Ahh just shut it down, ok now we have number three out, look lets just shut them all down just to be on the safe side hmmmmm. Pilots have training and operations manuals to handle DR and so should have Westpac. Lets face it they are one of 4 businesses that have had record profits for the last ten years consecutively, they have no excuse.
Then again this is probably another reason for not outsourcing, loss of control of important business systems.

Comments have been disabled for this article.
 
 
Top Stories
CenITex to move from IT provider to broker
Documents reveal new strategy.
 
eHealth measures missing the point
Opinion: When will the PCEHR lead to patient outcomes?
 
Photos: Google Glass gets real
Coming soon to an office near you.
 
 
Time for an upgrade?
Sign up to receive iTnews email bulletins
   FOLLOW US...

Latest VideosSee all videos »

Bankwest builds continuous delivery capability
Bankwest builds continuous delivery capability
To automatically deploy test/dev sandboxes by mid-year.
Veterans' Affairs sets sights on modernisation
Veterans' Affairs sets sights on modernisation
Data safe with Human Services, CIO says.
Citi Australia drops platform customisations
Citi Australia drops platform customisations
Technology chief shifts focus from building to leveraging systems.
VicRoads restructures IT team
VicRoads restructures IT team
Department moves to align with industry benchmarks.
Zurich Australia extends IT team offshore
Zurich Australia extends IT team offshore
Malaysian staff served from Australian data centres.
Leigh Berrell - Utilities CIO of the Year
Leigh Berrell - Utilities CIO of the Year
Yarra Valley Water CIO Leigh Berrell accepts his Benchmark Award for Utilities CIO of the Year.
Wayne McMahon - Retail CIO of the Year
Wayne McMahon - Retail CIO of the Year
Domino's Pizza CIO Wayne McMahon accepts his Benchmark Award for Retail CIO of the Year.
Inside Perpetual's ongoing IT transformation
Inside Perpetual's ongoing IT transformation
CIO Jenny Levy discusses how outsourcing will help the firm "simplify, refocus and grow".
Managing Complexity - Defence's Daniel McCabe
Managing Complexity - Defence's Daniel McCabe
Daniel McCabe, Assistant Secretary of Australia's Department of Defence, provides the audience at the iTnews Data Centre Strategy Summit with a deep dive into the organisation's data centre consolidation program.
How Facebook designed the data centre from scratch - Marco Magarelli
How Facebook designed the data centre from scratch - Marco Magarelli
The full keynote by Facebook data centre architect Marco Magarelli at the Australian Data Centre Strategy Summit. Magarelli details the design considerations behind the social network's Prineville, Oregon; North Carolina and Luleå, Sweden data centres.
Modernising Legacy Data Centres - Telstra's Jon Curry
Modernising Legacy Data Centres - Telstra's Jon Curry
Telstra general manager of managed data centres Jon Curry guides the audience at the iTnews Australian Data Centre Summit through the build of the telco's Clayton, Victoria data centre.
NSW Government launches NABERS data centre rating tools
NSW Government launches NABERS data centre rating tools
Matthew Clark from the NSW Department of Environment guides facilties managers through the details of the new NABERS data centre energy rating tool at the Australian Data Centre Strategy Summit.
NABERS launch panel: Australian Data Centre Strategy Summit
NABERS launch panel: Australian Data Centre Strategy Summit
Matthew Clark (NSW Dept of Environment), Greg Boorer (Canberra Data Centres), Glenn Allan (National Australia Bank), Mike Andrea (Strategic Directions) and Bob Sharon (Green Global Consulting) discuss the impact of the NABERS data centre rating.
Judges notes: Fortescue Metals [The Benchmark Awards]
Judges notes: Fortescue Metals [The Benchmark Awards]
iTnews' panel of judges discuss Fortescue Metals 'New World of Work" project, one of three shortlisted finalists for the Industrials category of the CIO Benchmark Awards.
Judges notes: Retail [The Benchmark Awards]
Judges notes: Retail [The Benchmark Awards]
iTnews' panel of judges discuss the shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: Pacific Aluminium [The Benchmark Awards]
Judges notes: Pacific Aluminium [The Benchmark Awards]
iTnews' panel of judges discuss Pacific Aluminium's lightning fast service desk refresh, one of three shortlisted finalists for the Industrials category of the CIO Benchmark Awards.
Judges notes: Domino's Pizza [The Benchmark Awards]
Judges notes: Domino's Pizza [The Benchmark Awards]
iTnews' panel of judges discuss Domino's Pizza's shift to hosted services, one of three shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: McDonald's Australia [The Benchmark Awards]
Judges notes: McDonald's Australia [The Benchmark Awards]
iTnews' panel of judges discuss McDonald's Australia's new self-service portal for employees, one of three shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: ING Direct [The Benchmark Awards]
Judges notes: ING Direct [The Benchmark Awards]
iTnews' panel of judges discuss ING Direct's 'Bank in a Box', one of three shortlisted finalists for the banking and finance category of the CIO Benchmark Awards.
Judges notes: Yarra Valley Water [The Benchmark Awards]
Judges notes: Yarra Valley Water [The Benchmark Awards]
iTnews' panel of judges discuss Yarra Valley Water's insourcing project, one of three shortlisted finalists for the Utilities category of the CIO Benchmark Awards.
Latest Comments
Polls
Do you prefer the Coalition's NBN policy?

   |   View results
Yes
  19%
 
No
  81%
TOTAL VOTES: 1690

Vote