IBM bungled IT storage repair, says bank

By on
IBM bungled IT storage repair, says bank

Routine component swap ends in seven-hour outage.

IBM personnel were blamed by a major Singapore bank after they reportedly botched a "routine" repair job on a disk storage subsystem, resulting in a seven-hour systems outage.

DBS Group's chief executive Piyush Gupta apologised to customers yesterday while pointing the finger at Big Blue over the bungle. The story was first reported by the Register.

The outage, between 3am and 10am Singapore time on Monday July 5, left customers unable to access banking and ATM services.

Gupta said the outage was "triggered during a routine repair job on a component within the disk storage subsystem connected to our mainframe."

The component reportedly emitted alert messages, leading to a decision to replace it in a "quiet period" - 3am.

The repair was carried out under the watch of IBM Asia Pacific, "the central support unit for all IBM storage systems in the region".

"Unfortunately, while IBM was conducting this routine replacement, a procedural error inadvertently triggered a malfunction in the multiple layers of systems redundancies, which led to the outage," Gupta said.

"We understand from IBM that an outdated procedure was used to carry out the repair."

Gupta said Big Blue informed the bank of the outage at 3am. A "technical command function" consisting of IBM and DBS IT staff then moved in at 3.40am.

A complete system restart at 5.20am failed due to "complications".

DBS' disaster recovery command centre was activated about an hour later.

All services were restored by "lunchtime", Gupta said.

Gupta highlighted several holes in the bank's disaster recovery processes as a result of the outage.

"On hindsight, our internal escalation process could have been more immediate," he said.

"We could also have done more to mobilise broadcast channels to inform customers of the disruption in services first thing in the morning."

He said the bank was doing everything to "prevent an incident of this scale from happening again."

"We take full responsibility for this incident. The matter is obviously of grave concern to us and we are working closely with IBM to ensure that such lapses do not recur or cause such significant impact," Gupta said.

IBM reportedly took responsibility in a separate statement, according to US publication Computerworld.

Tags:

Most Read Articles

Log In

Username:
Password:
|  Forgot your password?