Inside eBay’s 90PB data warehouse

By on
Inside eBay’s 90PB data warehouse

Engineers build analytics-as-a-service.

eBay has spent the past two years transforming its data analysis and reporting capabilities so front-line staff may help themselves to its massive data store.

The e-commerce giant stores almost 90PB of data about customer transactions and behaviours to support some $3500 of product sales a second.

Data is stored in three systems, with about 7.5PB in a Teradata enterprise data warehouse, 40PB on commodity Hadoop clusters and 40PB on ‘Singularity’: a custom system for performing deep-dive analysis on semi-structured and relational data.

eBay analytics platform and delivery director Alex Liang said the site’s growing popularity and usage trends had put a strain on his Shanghai-based engineering team.

“We noticed that smartphone and tablet users visit eBay more times every day than PC users,” he told a Teradata customer conference in Sydney on Thursday.

“We get more visits, which means we get more data, which means we have to process more data. That’s a problem we’ve been resolving for the past two years.”

As of last week, eBay had 500 million live auction listings, split into more than 50,000 categories.

The site has more than 100 million active users, generating up to 100TB of new data each day to be stored and used by more than 6000 eBay staff.

Users range from expert data scientists to non-technical business people and about 50 executives who need access to top-line reports.

“We need to make sure that the people who run the business, people who are doing the innovation, have direct access to the data instead of having people in the middle,” he said.

Liang likened eBay’s previous data analysis ecosystem to an “Ikea job interview” in which applicants would need to build their own chairs.

“When you first come to eBay, you don’t even know where to get data because there is data everywhere,” he said.

“We say, ‘we have all the data there, we have all the tools there’. So when you come to the company we say ‘just use the data’. But the problem is much more complex.”

Building a consistent, user-friendly ecosystem

eBay’s data analysis and reporting capabilities have come a long way since the late 1990s, when it had no data warehouse and used Microsoft Access for financial reporting, Liang said.

Staff now use a custom ‘DataHub’ platform to access a wide range of data access tools, including Microsoft Excel, visualisation tool Tableau, Oracle and MicroStrategy software, SQL and purpose-built applications.

Liang said DataHub represented “three years of heavy investment” during which it had adopted more user-friendly features like a search bar and categories.

A second platform, QuickStrike, offers a range of reporting dashboards to ensure that all users see a consistent set of metrics and key performance indicators.

Before QuickStrike was introduced, Liang said various business areas were using seven versions of what should have been the same metric across the globe.

“[It was] very tough for people to find the right metrics and reports,” he said. “People just go and do their own metrics and reports based on their assumptions and understanding of the data.

“The challenge is that in many cases, they don’t know which metrics and reports are old, which ones are new, which ones are accurate.

“So in many cases people would sit around a table discussing a business problem and think they’re talking about the same thing but … their reports had their own definitions.”

Ebay made two teams responsible for QuickStrike data quality and governance: a technology group and a dedicated team of business analysts.

The QuickStrike dashboard template took a year to design, Liang said, noting that eBay staff were sharing reports via email as recently as last year.

eBay staff also have access to a third tool, Metrics Explorer, to dig further into any business problems and discover potential solutions through deep data analysis.

Liang said eBay was committed to a multi-platform, self-serve data analytics environment. He said it would look to add a 1PB in-memory database to its trio of platforms in the near term.

“The future will be live,” he said. “You can’t drive your car by looking at the back … when you run a business, you can’t just look at what happened yesterday. You must be able to predict the future.”

Got a news tip for our journalists? Share it with us anonymously here.
Copyright © . All rights reserved.

Most Read Articles

Log In

  |  Forgot your password?