Facebook makes 30 petabyte Hadoop migration

 

Replicates system that supports Ad Network to new data centre.

Facebook has revealed it developed a replication system to move a 30 petabyte (PB) file system to a new data centre in Oregon.

Facebook’s data warehouse Hadoop cluster grew 10 PB over a the year to March 2010, hitting 30 PB which forced the data centre move.

Hadoop is a distributed file system developed by the Apache Software Foundation.

Facebook used the Hive datawarehousing framework and its massive Hadoop cluster for internal analysis and to support products, such as the Facebook Ad Network.  

Engineer Paul Yang said that physically moving the systems to the new location was "not a viable option as our users and analysts depend on the data 24/7, and the downtime would be too long".

The data warehouse hardware included 2000 12 terabyte machines, split into 1200 8-core machines and 800 16-core machines with 32GB RAM each.

To support the migration Facebook’s engineering team developed a replication system to mirror changes from the old cluster onto a new larger one, which would allow Facebook to “redirect everything” at switchover time.

“This approach is more complex as the source is a live file system, with files being created and deleted continuously,” said Yang. 

The first cab off the rank was a bulk copy using Hadoop applications such as DistCP, which was moved to the new destination.

Then, using its new replication system, Facebook dealt with file and metadata changes that occurred after the bulk copy process was started, explained Yang. 

“File changes were detected through a custom Hive plug-in that recorded the changes to an audit log. The replication system continuously polled the audit log and copied modified files so that the destination would never be more than a couple of hours behind," he said.

When the engineers were ready to begin the switchover, they “set up camp in a war room” and shut down the older JobTracker - the Hadoop service that scheduled tasks to the right node - and fired it up at the new location. 

“Once replication was caught up, both clusters were identical, and we changed the DNS entries so that the hostnames referenced by Hadoop jobs pointed to the servers in the new cluster,” said Yang. 

Yang believed Facebook's successful replication of the Hadoop cluster may improve the appeal of Hadoop and Hive to the enterprise.

Copyright © iTnews.com.au . All rights reserved.


Facebook makes 30 petabyte Hadoop migration
 
 
 
 
Top Stories
ATO commits to complexity
Greater demand, fewer apps.
 
Photos: AusCERT 2013 day two
The second day of the Queensland security conference.
 
The illusion of cognitive computing
Opinion: IBM's Watson is a marketing success.
 
 
Sign up to receive iTnews email bulletins
   FOLLOW US...

Latest VideosSee all videos »

Bankwest builds continuous delivery capability
Bankwest builds continuous delivery capability
To automatically deploy test/dev sandboxes by mid-year.
Veterans' Affairs sets sights on modernisation
Veterans' Affairs sets sights on modernisation
Data safe with Human Services, CIO says.
Citi Australia drops platform customisations
Citi Australia drops platform customisations
Technology chief shifts focus from building to leveraging systems.
VicRoads restructures IT team
VicRoads restructures IT team
Department moves to align with industry benchmarks.
Zurich Australia extends IT team offshore
Zurich Australia extends IT team offshore
Malaysian staff served from Australian data centres.
Leigh Berrell - Utilities CIO of the Year
Leigh Berrell - Utilities CIO of the Year
Yarra Valley Water CIO Leigh Berrell accepts his Benchmark Award for Utilities CIO of the Year.
Wayne McMahon - Retail CIO of the Year
Wayne McMahon - Retail CIO of the Year
Domino's Pizza CIO Wayne McMahon accepts his Benchmark Award for Retail CIO of the Year.
Inside Perpetual's ongoing IT transformation
Inside Perpetual's ongoing IT transformation
CIO Jenny Levy discusses how outsourcing will help the firm "simplify, refocus and grow".
Managing Complexity - Defence's Daniel McCabe
Managing Complexity - Defence's Daniel McCabe
Daniel McCabe, Assistant Secretary of Australia's Department of Defence, provides the audience at the iTnews Data Centre Strategy Summit with a deep dive into the organisation's data centre consolidation program.
How Facebook designed the data centre from scratch - Marco Magarelli
How Facebook designed the data centre from scratch - Marco Magarelli
The full keynote by Facebook data centre architect Marco Magarelli at the Australian Data Centre Strategy Summit. Magarelli details the design considerations behind the social network's Prineville, Oregon; North Carolina and Luleå, Sweden data centres.
Modernising Legacy Data Centres - Telstra's Jon Curry
Modernising Legacy Data Centres - Telstra's Jon Curry
Telstra general manager of managed data centres Jon Curry guides the audience at the iTnews Australian Data Centre Summit through the build of the telco's Clayton, Victoria data centre.
NSW Government launches NABERS data centre rating tools
NSW Government launches NABERS data centre rating tools
Matthew Clark from the NSW Department of Environment guides facilties managers through the details of the new NABERS data centre energy rating tool at the Australian Data Centre Strategy Summit.
NABERS launch panel: Australian Data Centre Strategy Summit
NABERS launch panel: Australian Data Centre Strategy Summit
Matthew Clark (NSW Dept of Environment), Greg Boorer (Canberra Data Centres), Glenn Allan (National Australia Bank), Mike Andrea (Strategic Directions) and Bob Sharon (Green Global Consulting) discuss the impact of the NABERS data centre rating.
Judges notes: Fortescue Metals [The Benchmark Awards]
Judges notes: Fortescue Metals [The Benchmark Awards]
iTnews' panel of judges discuss Fortescue Metals 'New World of Work" project, one of three shortlisted finalists for the Industrials category of the CIO Benchmark Awards.
Judges notes: Retail [The Benchmark Awards]
Judges notes: Retail [The Benchmark Awards]
iTnews' panel of judges discuss the shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: Pacific Aluminium [The Benchmark Awards]
Judges notes: Pacific Aluminium [The Benchmark Awards]
iTnews' panel of judges discuss Pacific Aluminium's lightning fast service desk refresh, one of three shortlisted finalists for the Industrials category of the CIO Benchmark Awards.
Judges notes: Domino's Pizza [The Benchmark Awards]
Judges notes: Domino's Pizza [The Benchmark Awards]
iTnews' panel of judges discuss Domino's Pizza's shift to hosted services, one of three shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: McDonald's Australia [The Benchmark Awards]
Judges notes: McDonald's Australia [The Benchmark Awards]
iTnews' panel of judges discuss McDonald's Australia's new self-service portal for employees, one of three shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: ING Direct [The Benchmark Awards]
Judges notes: ING Direct [The Benchmark Awards]
iTnews' panel of judges discuss ING Direct's 'Bank in a Box', one of three shortlisted finalists for the banking and finance category of the CIO Benchmark Awards.
Judges notes: Yarra Valley Water [The Benchmark Awards]
Judges notes: Yarra Valley Water [The Benchmark Awards]
iTnews' panel of judges discuss Yarra Valley Water's insourcing project, one of three shortlisted finalists for the Utilities category of the CIO Benchmark Awards.
Latest Comments
Polls
Do you prefer the Coalition's NBN policy?

   |   View results
Yes
  19%
 
No
  81%
TOTAL VOTES: 1731

Vote