Called Presto, the query engine is able to use a variety of data bases, from multiple Hadoop Distributed File System (HDFS) sources to proprietary data stores. The purpose of Presto is to provide fast interactive analytics with the speed of commercial data warehouses, and was written from the ground up by Facebook.
It scales to very large sized data sources, and Facebook itself uses Presto for its 300 petabyte data warehouse, with over a thousand employees running 30,000 queries daily, scanning over a petabyte of information each time.
A large subset of standard ANSI SQL is supported, but as of yet, cannot write output data back into tables with queries being streamed to clients instead.
Facebook engineer Martin Traverso said in a blog post that Presto came about after the social network's data warehouse grew to petabyte scale, and it wanted an interactive system optimised for low latency for queries against its HDFS-based clusters.
As a result, Presto does not use MapReduce and does all processing in memory which in turn is pipelined through the network between stages to avoid unnecessary I/O which would cause greater latency for queries.
According to Traverso, Presto is ten times better than Hive and MapReduce for processor efficiency and query latency in most cases. Traverso says Presto was written in Java for speed of development and "has a great ecosystem" while being easy to integrate with other components Facebook has built in the same language.