PayPal's senior technology architect has warned end users against being hoodwinked into buying 'Big Data' solutions and adopting flatter databases they don't need.
Daniel Austin urged his peers this week not to become "victims of technological fashion" such as NoSQL databases, during an entertaining presentation entitled "Big Data is a Big Scam" at MySQL Connect in San Francisco.
"Only use big data solutions when you have a big data problem," he told attendees.
"Too many CIOs feel the need to find a solution to a problem they don't have."
Defining a big data problem only by the volume of data organisations were faced with capturing and processing was no different to other data issues faced throughout the history of computing, Austin argued.
"We have been able to solve big data problems for a long time — provided we can do it at one location in a batch process," he said.
"The answer to simply having too much data has been solved a number of ways over the years.
"The real problem we're really trying to solve is fast data — a combination of large datasets, complex data models and a need to process that data at high frequency.
"The real problem is also more to do with how we manage data stores across vast geographical distances in globally distributed systems. The real problem is — we have a lot of data distributed geographically and need to be able to read and write from anywhere in the world at any time."
Austin said end users should instead define big data according to the true nature of the problem, rather than by the 'solutions' being marketed to address it.
He was particularly incensed by the tendency for self-interested commentators to call a premature death of the relational database model.
In 2010, for example, unable to cope with the large datasets being collected and the analytics applications developed to take advantage of them.Charles Silver argued that rows and columns were
In 2011, fellow entrepreneur Michael Stonebraker similarly predicted that Facebook and other companies that initially used an open source relational database like MySQL would inevitably need to re-write their applications for similar reasons.
These perceived limitations have spurred interest in database management systems that abandon tables and structured language queries for a simpler, flatter approach with a focus on speed and the ability to scale horizontally.
Over 120 alternatives have been released under the banner of 'NoSQL' — among them the Apache Cassandra and HBase (Hadoop database) pioneered by Facebook, DynamoDB (used by Amazon Web Services), Project Voldemort (used by LinkedIn), Google's Datastore and MondoDB (used by Foursquare, among others).
Austin expressed concerned that many organisations now felt the need to abandon their relational database models to remain competitive.
"Big Data is a common set of problems," Austin retorted.
Austin pointed out that the biggest and oldest NoSQL database is actually the DNS (domain name system) that gives order to the internet. It was developed in 1983.
"It worked well long before we thought we had a 'big data' problem," he said.
"But I'm here to tell you you don't have to give up your relational model."
Austin used a system he built on relational database MySQL, which he cheekily named YESQL, to demonstrate his point.
Paypal has built a globally-distributed user system, hosted on a global network of five Amazon Web Services data centres, that achieves consistent globally-replicated response times of under 350 miliseconds.
"My executives didn't care if I used big data, small data, red data, blue data," he said.
"My execs don't care if it's written on the back of a turtle or papyrus. They only care that it doesn't fail, that it must not lose data, that it must support transactions, that it scales linearly with costs, and that you can write in one place and read it in another within a second."
Austin advocated a horses-for-courses approach when choosing between relational databases like MySQL and flatter NoSQL alternatives.
"MySQL is mature, its been around 15 years, and in many cases its better," he said.
"NoSQL [database management systems] are suitable for simple processing, but they offer relatively low levels of maturity."
NoSQL might be able to deliver the results of a simple query faster, he added, but he warned that it tended to require many extensions — and "many extensions complicate a simple design".
DNS might be fast, he said by way of example, but DNS variability is very high. As Facebook's engineers stressed in a recent Tech Talk, it is the consistency of performance, rather than raw speeds, that matter the most to customers when using an online service — which is why a relational database still has a place at the social network giant.
"Not all big data solutions are created equal — you need to think about the trade-offs you need to make between consistency, availability, performance and variability," Austin said.
Austin joined database administrators at Twitter and Facebook in revealing the intimate details of their respective MySQL deployments.