Google explains Apps data centre failure

Mar 10, 2010 12:48 AM
Tags: google | datacentre | failure | outage | app | apps

Honest assessment of February outage.

Google has published a post-mortem of an incident in February in which Google Apps went down for over two hours.

All Google App Engine applications were "degraded" from 7:48am to 10:09am PST on 24 February after a power failure at the company's main data centre, the firm said.

About 25 percent of the servers failed within five minutes owing to a delay in back-up power generation. Google's message boards started showing questions from users almost immediately.

"By this time, our primary on-call engineer had determined that App Engine is down," the report said.

"The on-call engineer, according to procedure, paged our product managers and engineering leads to handle communicating the outage to users. A few minutes later, the first post from the App Engine team about this outage is made on the external group."

There was confusion about the instructions for switching to a back-up data centre and the decision-maker for the crossover could not be found. The team then received data suggesting that the data centre was recovering and that a changeover was not neccesary.

However, the data turned out to be inaccurate and this extended the outage considerably. By the time the move to the backup servers had been made, Google Apps had been down for more than two hours.

The report found that Google had not developed plans for a partial data centre failure, nor for determining whether the data centre was able to continue running on such a reduced server count.

The company will now hold regular drills for failure, with a wider spectrum of possible situations, and a bi-monthly audit of all operations documents.

Google claimed that a similar failure today would cause a service slowdown for a maximum of 20 minutes with the new procedures, rather than a complete outage.

Copyright ©v3.co.uk


  • Email a Friend
  • Print Page
Google explains Apps data centre failure
 
Comments

Be the first to comment on this article.
Thoughts on this article? Add a comment below.
Comment:
Want to participate in the discussion?
Or log in now to comment
 
 
 
Top Stories
Oracle shuts down open source test servers
Playing nice with the open source community, Larry?
 
Google hosts election debate
Lundy, Fletcher and Ludlam face off on tech policies.
 
Telstra fined $18.5m for exchange access
Kept competitive DSLAM kit out.
 

Latest VideosSee all videos »

Latest Comments
"Now Julia, if only you would promise not to filter the internet in your next term of government ..."
by hsvandrew Jul 31, 2010 9:33 AM
 
"@Nate - my fears are that if we use a national consortium as an interface to international ..."
by heavenlyhaloes Jul 31, 2010 12:41 AM
 
"Did anybody notice that on Apple's website the iPhone is missing the AT&T logo on the top bar? ..."
by brownenicola Jul 30, 2010 10:18 PM
 
"@digger11 - when will you learn just to remain quiet when you don't have all the facts or a ..."
by Bazwalt Jul 30, 2010 7:13 PM
 
"Wakie is right, Digger11 is either an exceptional forum troll or a massive moron. For those who ..."
by Bazwalt Jul 30, 2010 6:51 PM
Polls
Did Google breach the Telecommunications Interception or Privacy Acts during its WiFi wardrive?

   |   View results
Yes. There is no excuse for collecting this data.
  28%
 
No. If your wireless network is unsecured, you have no right to complain
  72%
TOTAL VOTES: 1873

Vote