Facebook makes 30 petabyte Hadoop migration

By Liam Tung

Aug 1 2011 7:02AM

Replicates system that supports Ad Network to new data centre.

Facebook has revealed it developed a replication system to move a 30 petabyte (PB) file system to a new data centre in Oregon.

Facebook makes 30 petabyte Hadoop migration

Facebook’s data warehouse Hadoop cluster grew 10 PB over a the year to March 2010, hitting 30 PB which forced the data centre move.

Hadoop is a distributed file system developed by the Apache Software Foundation.

Facebook used the Hive datawarehousing framework and its massive Hadoop cluster for internal analysis and to support products, such as the Facebook Ad Network.

Engineer Paul Yang said that physically moving the systems to the new location was "not a viable option as our users and analysts depend on the data 24/7, and the downtime would be too long".

The data warehouse hardware included 2000 12 terabyte machines, split into 1200 8-core machines and 800 16-core machines with 32GB RAM each.

To support the migration Facebook’s engineering team developed a replication system to mirror changes from the old cluster onto a new larger one, which would allow Facebook to “redirect everything” at switchover time.

“This approach is more complex as the source is a live file system, with files being created and deleted continuously,” said Yang.

The first cab off the rank was a bulk copy using Hadoop applications such as DistCP, which was moved to the new destination.

Then, using its new replication system, Facebook dealt with file and metadata changes that occurred after the bulk copy process was started, explained Yang.

“File changes were detected through a custom Hive plug-in that recorded the changes to an audit log. The replication system continuously polled the audit log and copied modified files so that the destination would never be more than a couple of hours behind," he said.

When the engineers were ready to begin the switchover, they “set up camp in a war room” and shut down the older JobTracker - the Hadoop service that scheduled tasks to the right node - and fired it up at the new location.

“Once replication was caught up, both clusters were identical, and we changed the DNS entries so that the hostnames referenced by Hadoop jobs pointed to the servers in the new cluster,” said Yang.

Yang believed Facebook's successful replication of the Hadoop cluster may improve the appeal of Hadoop and Hive to the enterprise.

Got a news tip for our journalists? Share it with us anonymously here.

Tags:

Partner Content

Promoted Content You meet the security standard. Shame no one can see it

Partner Content The hidden economics of AI: Why token usage matters more than you think

Partner Content Agile isn’t the problem: why projects still fail, and what’s missing

Promoted Content From test case to control tower: How DXC and ServiceNow are governing enterprise AI at scale

NAB builds integrated ops hub for threat intelligence

Trump no longer views Anthropic as national security threat

Suncorp to have AI agents in insurance claims process as soon as this month

In Pictures: iTnews State of Security Breakfast Roadshow 2026 - Brisbane

How Lendi Group rebuilt its mortgage platform for the agentic era

Facebook makes 30 petabyte Hadoop migration

Replicates system that supports Ad Network to new data centre.

Partner Content

Sponsored Whitepapers

Events

Most Read Articles

Pipe chief to quit, build new data centres

How Woolworths uses Google to power its massive analytics uplift

NSW govt data centre goes down after power outage

ATO seeks new CIO amid technology delivery shake-up

Most popular tech stories

Bunnings to sell through Google AI Mode

Treasury Wine Estates to go big on digital, data and AI

ABC drops Salesforce for Braze

Chemist Warehouse's AI tool for HR becoming a "standard pattern"

Virgin Australia, Wesfarmers strike OpenAI agreements

HamiltonJet partners with digital services provider Fortude

SentinelOne signs distribution agreement with Sektor

Rapid7’s new SIEM combines exposure management with threat detection

The techpartner.news podcast, episode 3: Why security consultancy founder Kat McCrabb started with the hard stuff

Bluechip Infotech enters final stage of Goodson Imports acquisition

Blackberry celebrates "giant step forward"

'Touch-free' smartphone controlled with head movements

Photos: Australian industry explores data for net zero

Telstra Purple acquires IoT specialists Alliance Automation, Aqura Technologies

Govt launches consumer tech label program for smart devices

NAB builds integrated ops hub for threat intelligence

Trump no longer views Anthropic as national security threat

Suncorp to have AI agents in insurance claims process as soon as this month

In Pictures: iTnews State of Security Breakfast Roadshow 2026 - Brisbane

How Lendi Group rebuilt its mortgage platform for the agentic era

Facebook makes 30 petabyte Hadoop migration

Replicates system that supports Ad Network to new data centre.

Add iTnews as your trusted source

Partner Content

Sponsored Whitepapers

Events

Most Read Articles

Pipe chief to quit, build new data centres

How Woolworths uses Google to power its massive analytics uplift

NSW govt data centre goes down after power outage

ATO seeks new CIO amid technology delivery shake-up

Most popular tech stories

Bunnings to sell through Google AI Mode

Treasury Wine Estates to go big on digital, data and AI

ABC drops Salesforce for Braze

Chemist Warehouse's AI tool for HR becoming a "standard pattern"

Virgin Australia, Wesfarmers strike OpenAI agreements

HamiltonJet partners with digital services provider Fortude

SentinelOne signs distribution agreement with Sektor

Rapid7’s new SIEM combines exposure management with threat detection

The techpartner.news podcast, episode 3: Why security consultancy founder Kat McCrabb started with the hard stuff

Bluechip Infotech enters final stage of Goodson Imports acquisition

Blackberry celebrates "giant step forward"

'Touch-free' smartphone controlled with head movements

Photos: Australian industry explores data for net zero

Telstra Purple acquires IoT specialists Alliance Automation, Aqura Technologies

Govt launches consumer tech label program for smart devices

Log In