Maintenance error caused Facebook's six-hour outage

By on
Maintenance error caused Facebook's six-hour outage

Undisclosed command triggered meltdown.

Routine maintenance on Facebook's network stitching together its data centres caused Tuesday's collapse of its global system for more than six hours.

The outage blocked access to apps for billions of users of Facebook, Instagram and WhatsApp, further intensifying weeks of scrutiny for the US$1 trillion company.

In a blog post, Facebook vice president of engineering Santosh Janardhan explained the company's engineers issued a command that unintentionally disconnected Facebook data centres from the rest of the world.

While users lost access to one of the world's most popular messaging apps - as WhatsApp has more than 2 billion users - employees were also blocked from internal tools.

The outage knocked out tools that engineers would normally use to investigate and repair such outages, making the task even more difficult, Facebook said.

The company said it sent a team of engineers to the location of its data centres to try to debug and restart the systems.

However, it took the company extra time to get engineers inside to work on the servers due to the high physical and system security in place.

Facebook added that its program audit tool had a bug, and failed to stop the command that caused the outage.

"Every failure like this is an opportunity to learn and get better," Janardhan wrote.

"From here on out, our job is to ... make sure events like this happen as rarely as possible."

Got a news tip for our journalists? Share it with us anonymously here.
Tags:
bgp centre data dns error facebook maintenance networking

Sponsored Whitepapers

Unlock faster time-to-revenue using Adobe digital document processes
Unlock faster time-to-revenue using Adobe digital document processes
How Security as Code changes development and deployment for the cloud
How Security as Code changes development and deployment for the cloud
Tackle new ITSM priorities with this seven-step Micro Focus guide
Tackle new ITSM priorities with this seven-step Micro Focus guide
Tomago Aluminium improves SAP environment performance, security with Red Hat and IBM
Tomago Aluminium improves SAP environment performance, security with Red Hat and IBM
Future of work: Support your distributed HR workforce with digital document processes
Future of work: Support your distributed HR workforce with digital document processes

Most Read Articles

Telstra broaches ADSL's 'end of life'

Telstra broaches ADSL's 'end of life'
Microsoft confirms Outlook.com outage

Microsoft confirms Outlook.com outage
VMware vCenter under widespread attack

VMware vCenter under widespread attack
DTA finds its new chief

DTA finds its new chief

Log In

Email:
Password:
  |  Forgot your password?