REA Group repatriates 500TB of data to Google's Sydney cloud region

By Ry Crozier

Jul 31 2020 6:55AM

Having originally hosted it in Europe.

REA Group moved 500TB of BigQuery data from the EU multi-region to the Sydney region of Google Cloud over a period of five weeks.

REA Group repatriates 500TB of data to Google's Sydney cloud region

Credit: REA Group

The project was performed by REA Group subject matter experts and two consultants from IT services firm Servian, and the work is detailed in a Medium post.

The post notes REA began using Google Cloud “several years ago, primarily focusing on using the data analytics tools and services on the Google Cloud Platform (GCP) technology stack.”

“Back at the beginning of the journey, the region in Sydney did not exist,” Servian senior consultant Pablo Caif wrote.

“Like many other GCP customers at that time, REA Group chose the EU multi-region for analysing their data in BigQuery.

“Fast forward to today, and because of newly established contractual obligations and data sovereignty requirements, REA Group wanted to repatriate its BigQuery EU datasets to the relatively new Sydney region.”

Caif said REA’s GCP-based data workloads served analytical and “critical reporting functions” for sales, marketing, audience and other purposes.

He said Google Cloud Storage (GCS) and its transfer service “was used as the main technology” to repatriate the data.

“We used GCS to extract the data into, and then reload it back into BigQuery on the Sydney side,” Caif wrote.

‘Hot data’ - data called upon most frequently by REA - had to be moved “within an aggressive 48 hour window.”

“This, coupled with the need to validate that the data had been migrated successfully and without corruption, made it all the more challenging from an engineering perspective,” Caif wrote.

Caif was at pains not to over-simplify the project.

“When you’re shovelling half a petabyte of data around from one continent to another, things get a lot more interesting and challenging,” he wrote.

“The movement of that much data did in fact throw up a few considerations that weren’t in play for smaller repatriation projects that we’d done in the past.

“For example, BigQuery has limits and quotas for extracting and loading with GCS that we needed to consider and engineer solutions for.”

The repatriation covered a few “hot data” tables exceeding 100TB in size and that were being updated in real-time by streaming jobs.

“Migrating these tables were by far the most technically challenging hurdles that we needed to overcome,” Caif wrote.

Servian wound up breaking the tables into smaller chunks that could be more easily migrated, especially within data extraction limits set by Google.

“On the other side, when reloading them back in, we of course needed to reassemble/ recombine then back into one table with the right partitions,” Caif wrote.

“This also involved some more heavy duty engineering effort.”

Got a news tip for our journalists? Share it with us anonymously here.

Tags:

Partner Content

Scalable AI solutions: secure delivery

Promoted Content Why resilient communications are becoming critical infrastructure for modern enterprise IT

Partner Content CommBank creates opportunities for technologists to upskill with frontier AI companies

Partner Content AI agents are reshaping identity governance, and attackers are already exploiting the gap

Events

Most Read Articles

Impact Awards: Tecala slashes customer response times for fintech IQumulate

Interactive introduces private cloud platform

Digital61 expands cybersecurity portfolio

CBA appoints new group CIO

LMG bets on Slack as an enterprise platform

ASD draws a hard line on developers lacking security skills

NAB taps Databricks' Genie AI tools to derive more value from its data

Toll Group modernises network to bypass data centres

REA Group repatriates 500TB of data to Google's Sydney cloud region