How Facebook ships software at warp speed

By
Page 2 of 2  |  Single page

The tools for speed

How Facebook ships software at warp speed

Facebook invests significant resources in its developers.

There is no expense spared for the test and dev servers - the last batch boasted 24-cores, 144GB of RAM and 2.7 TB of FusionIO flash storage.

Jobs are scheduled on a simple Facebook application which is "about as lightweight" as possible.

"There is no defined process - it is organic and up to the team," Pobar said.

As developers write a feature on these local dev machines, they can test its impact on the latest version of Facebook.com (literally using latest.facebook.com in the browser) in real-time by simply pressing the F5 key. HHVM dynamically recompiles the page on the fly.

When a developer wants to ship a change (a 'differential' or just 'diff'), the new code is shared with peers in the engineering team for testing and feedback. If its generally agreed to be a sound change, the code is committed as a new branch to Facebook's trunk, ready for the weekly ship of code into production every Tuesday.

In the two days leading up to that Tuesday "push", other Facebook employees can run the changes on 'latest.Facebook.com' and file bug reports, check for errors in the logs, and provide other feedback before any member of the public sees the code in production.

But while code might run fine in dev and test, it might not run as soundly when facing the onslaught of millions of users. Facebook have developed two key A/B testing tools and processes to see how changes might fare in the real world.

The most formidable of them is Perflab. Perflab allows developers to simulate the traffic load of 24-hours of users against Facebook.com with the new change up and running in a drastically lower time.

"Perflab takes a snapshot of the last 24 hours of traffic from real users on Facebook.com, anonymises it, and throws it at 1000 machines," Pobar said. "You basically replay the last 24 hours of traffic on Facebook with your new diffs.

"We measured that 24 hours of traffic and 1000 servers was what was required for a test to give us statistically relevant data. To test a change on a globally-distributed application, you need to capture the curve of traffic as each of the US, Asia and Europe hits their usage peaks. Users in Asia, for example, tend to spend more time on mobile phones rather than PC browsers - you need to test your change against a wide range of request types. We also use a mixture of machine [server] types to replicate how our data centre might react to the change - perhaps it performs better for x percentage of them, worse for y percent. Performance improvement or not, if the new HHVM you are testing can tolerate the PerfLab traffic, that's a very good indicator it can work in production. If your diff is bad, PerfLab is going to tell you."

Pobar said PerfLab can currently replay 24 hours of Facebook traffic in between 45 and 60 minutes, but that the team is working to reduce that window further over time.

An even craftier tool is 'Gatekeeper' which allows an engineer or a team to pilot a new feature in a specific geography, demographic or group before rolling it out to a billion active users.

"We have all sorts of tools to determine how new features are performing - lots of logging and graphs. But the ultimate test is whether people are actually using it," Pobar said.

"New Zealand is often the country of choice to turn features on and experiment - it is a mini-subset of the world in a lot of ways. We collect data on use, share it with the team and ask ourselves: are we moving the needle in the right direction? Then its rinse, repeat, iterate."

Previous Page 1 2 Single page
Got a news tip for our journalists? Share it with us anonymously here.
Copyright © iTnews.com.au . All rights reserved.
Tags:

Most Read Articles

Orica to set new workforce systems live in Australia in July

Orica to set new workforce systems live in Australia in July

Lion builds an app to detect its beers on tap in venues

Lion builds an app to detect its beers on tap in venues

ANZ Institutional readies go-live for "multi-agent chatbot" amie

ANZ Institutional readies go-live for "multi-agent chatbot" amie

Victoria Police refreshes online reporting

Victoria Police refreshes online reporting

Log In

  |  Forgot your password?