IBM bungled IT storage repair, says bank

 

Routine component swap ends in seven-hour outage.

IBM personnel were blamed by a major Singapore bank after they reportedly botched a "routine" repair job on a disk storage subsystem, resulting in a seven-hour systems outage.

DBS Group's chief executive Piyush Gupta apologised to customers yesterday while pointing the finger at Big Blue over the bungle. The story was first reported by the Register.

The outage, between 3am and 10am Singapore time on Monday July 5, left customers unable to access banking and ATM services.

Gupta said the outage was "triggered during a routine repair job on a component within the disk storage subsystem connected to our mainframe."

The component reportedly emitted alert messages, leading to a decision to replace it in a "quiet period" - 3am.

The repair was carried out under the watch of IBM Asia Pacific, "the central support unit for all IBM storage systems in the region".

"Unfortunately, while IBM was conducting this routine replacement, a procedural error inadvertently triggered a malfunction in the multiple layers of systems redundancies, which led to the outage," Gupta said.

"We understand from IBM that an outdated procedure was used to carry out the repair."

Gupta said Big Blue informed the bank of the outage at 3am. A "technical command function" consisting of IBM and DBS IT staff then moved in at 3.40am.

A complete system restart at 5.20am failed due to "complications".

DBS' disaster recovery command centre was activated about an hour later.

All services were restored by "lunchtime", Gupta said.

Gupta highlighted several holes in the bank's disaster recovery processes as a result of the outage.

"On hindsight, our internal escalation process could have been more immediate," he said.

"We could also have done more to mobilise broadcast channels to inform customers of the disruption in services first thing in the morning."

He said the bank was doing everything to "prevent an incident of this scale from happening again."

"We take full responsibility for this incident. The matter is obviously of grave concern to us and we are working closely with IBM to ensure that such lapses do not recur or cause such significant impact," Gupta said.

IBM reportedly took responsibility in a separate statement, according to US publication Computerworld.


IBM bungled IT storage repair, says bank
"I agree, well done IB.. wait a minute... didn't IBM get the job of installing the new payroll system for QLD health ? :P"
By realitybites
 
 
 
Comments: 2
umbria
Jul 16, 2010 12:36 PM
It shows how far disaster recovery has improved when a bank system being fully restored by 10am after a disaster is newsworthy. Well done, IBM, I say.
realitybites
Jul 16, 2010 2:38 PM
I agree, well done IB.. wait a minute... didn't IBM get the job of installing the new payroll system for QLD health ? :P
Comments have been disabled for this article.
 
 
 
Top Stories
Vito Forte: A CIO for tough times
Fortescue Metals CIO talks vendor management and innovation.
 
Telstra shifts BigPond email to Windows Live
All data to be migrated to Microsoft cloud.
 
Vodafone Australia churn nears half a million for 2011
British joint owners 'not pleased'.
 
Sign up to receive iTnews email bulletins
   FOLLOW US...

Latest VideosSee all videos »

Latest Comments
Polls
Would you be concerned about your business' email data being hosted offshore?

   |   View results
Yes
  83%
 
No
  17%
TOTAL VOTES: 245

Vote