Wall Street regulator FINRA has put its hand up to build what financial commentators are calling “the biggest database in history” in the AWS public cloud.

FINRA is one of three shortlisted bidders competing to build a “consolidated audit trail" of every single share trade and options order made in US financial markets each day.
The vast market surveillance system is designed to ramp up the amount of data available to investigators so they are never again caught out by an event like the “Flash Crash” of 2010, when the Dow Jones Industrial Average bounced nearly 1000 points and wiped billions off the value of the market in minutes.
The US Securities and Exchange Commission (SEC) has given the database build the green light, and a committee of stock exchanges will vote on who gets paid an estimated US$2.4 billion to construct it in early 2017.
What makes FINRA's bid different from its competitors - like fintech firm Fidelity National Information Services (FIS), which has partnered with Google cloud services for its own push - is that it has already started.
The regulator has built a version of the system for its own surveillance purposes, ready to scale as soon as it gets the SEC tap.
How big is big?
On its own, FINRA already collects and processes up to 75 billion records on every share transaction on the US market each day.
Speaking at the AWS Re:Invent summit last week, FINRA CIO Steve Randich said that figure equated to "what Visa and Mastercard process over six months".
“Stitch all this data together over weeks and months and then we are talking trillions of records - over 20 petabytes," he said.
The not-for-profit regulator is responsible for enforcing SEC rules over 90 percent of the US equities market, and about 60 percent of the US options market by volume.
In the business of stamping out fraud and market manipulation, milliseconds are critical. FINRA has to be able to effectively “replay” the whole network of trades in a time-sequenced order - even though the 3876 securities firms and 641,494 brokers under its watch can all be working to marginally different clocks.
It has to keep the data for a minimum of two years, because you never know when a fraud prosecution will kick off.
And the 75 billion records daily peak is just today: FINRA’s regulation technology director Brett Shriver said trade volumes are going up around 20 percent every year thanks to trends like high frequency trading.
“Like a few Google searches a day”
Randich likened FINRA’s legacy, on-premise solution for dealing with its regulatory data ingestion to “needing to research something and only being able to do a few Google searches a day”.
Its inflexible resources and batch-based processes meant if surveillance teams needed to re-analyse a window of trades, they would have to join a queue for spare capacity on the systems, a wait that could extend into months.
If they really, really needed extra capacity they would have to pull new hardware into their data centres, migrate applications over the weekend, “and hope that come Monday or Tuesday we wouldn’t end up on the front page of the Wall Street Journal,” Shriver said.
Maintenance costs extended into eight digits, and the organisation was left guessing in advance how much storage it might need years down the track.
So when FINRA joined the race to build the consolidated audit trail, it grew impatient and decided to start anyway.
“We could use this architecture for our current surveillance platform and database," Randich said.
“So we said, let’s go built it now.”
Becoming public cloud gurus
In the middle of this year FINRA stood up a brand new regulatory platform based on Apache's Spark, HBase, and Hive tools, using Amazon EMR with AWS S3 as its primary storage.
Randich said he had to run the gauntlet of naysayers when the regulator decided to go public cloud and open source.
“I had one of the most senior executives at one of the largest technology companies in the world tell me this doesn’t belong in the cloud. It is not going to work,” he said.
“We had streams of proprietary database vendors coming in one-by-one telling us it wouldn’t scale, it wasn’t mature, it won’t work.
"We have proven them all wrong.”
The effort earned FINRA the praise of AWS CEO Andy Jassy, who called the firm “one of the very top practitioners of building on top of AWS” in the world today.
FINRA currently has 2 trillion rows of data in HBase, a number the team expects to grow dramatically.
The impact has been immediate for its investigators, who are now getting the results to database queries on average 400 times faster.
“The investigative capacity of our surveillance teams has expanded dramatically,” Randich said.
From a financial perspective FINRA’s use of AWS spot pricing - its cheap but unpredictable EC2 auctions - has delivered “an order of magnitude in savings” according to Shriver, who says non time-sensitive queries can be queued up until cheap compute becomes available.
“We can trade off what we want to pay and how fast we need it done. It has been a real game-changer for FINRA to help us keep up with demand,” he said.
Paris Cowan travelled to AWS Re:Invent as a guest of Amazon Web Services