Security researcher Nik Cubrilovic has spent the last few years investigating how company stocks can be traded using information inadvertently leaked out of businesses.
Hedge funds and investors are increasingly using technical tools that take advantage of information leakage and open source intelligence to gain a leg up on other stock traders.
One example of a company servicing this market is Silicon Valley startup Second Measure, which inspects billions of credit card transactions to sell insights on between 1 and 2 percent of all US credit card transactions.
This data revealed to traders earlier this year that US food chain Chipotle had struggled to recover from an e coli scare, despite analysts predicting otherwise.
The credit card transaction data showed sales at the food chain had shrunk significantly in the aftermath, indicating customer numbers were down ahead of the company’s official results announcement - giving stock traders a chance to move before the figures came out.
“Hedge funds have for years invested heavily in new technology to give themselves an information edge over their competitors and ordinary stock market participants, and these efforts have now expanded to a suite of techniques and tactics that involve what can only be best described as online surveillance,” Cubrilovic told the AusCERT 2016 conference last week.
He refers to this as ‘stock hacks” - trading strategies that are “built based on non-public data obtained using information security techniques”.
The type of trading that lends itself best to stock hacks, he found, is event-based trading - where investors trade stock long or short for a single event, generally results announcements for a listed business.
Cubrilovic used this approach to track the number of Adobe Creative Cloud customers over the last 18 months.
The software giant has spent the past few years transitioning users away from licensed desktop software to a cloud-based subscription model.
"The whole future of the company depended on this transition," Cubrilovic said.
"Every quarter analysts would watch this one number - the number of Creative Cloud subscribers - to see if they were on track to rescue the software business."
Cubrilovic spotted that Adobe used AWS as its infrastructure backend and inadvertently revealed large user IDs - meaning that when Adobe last December reported 4.5 million more Creative Cloud subscribers than expected, Cubrilovic already knew.
"I traded on the information and the stock popped, 8, 9, 10 percent," he said.
Traders rely on various metrics to make events-based trades - including novel things like tracking car park levels at the likes of Walmart to determine sales and therefore revenue - but information leakage and open source intelligence are emerging as growth areas.
Open source intelligence uses data gathered through public websites, media reports, surveys, and geographic information. It can also include things like domain name searchers, sweeps of IP address ranges, and indexing and crawling web applications.
Information leakage is the disclosure of any information that describes a system - things like the app’s architecture, internal business practices, data about the app’s users, and employee information.
“Data and information leaks can be described as either design features or errors in an application that unintentionally expose the inner workings of an application or network,” Cubrilovic said in a paper on the topic.
“Information leaks can be used to determine direct and indirect metrics for a company - how big a company is, or how popular its main product is, and using that information to trade its stock with a significant edge over others in the market.”
Leaving user IDs open
As with the Adobe case, many modern web application frameworks expose auto increment for user IDs, meaning a user's ID number is identifiable either in the URL or within the application itself.
It means external parties can identify the application's number of users, the growth rate of users, and the order in which IDs were created.
Facebook is often cited as an example - Mark Zuckerberg is identifiable as user ID 4 (the social network has since stopped this practice).
“To find or estimate the number of users, you wouldn't run through every single record - but rather you would sample IDs surrounding a known ID, or take a divide and conquer approach to the entire namespace,” Cubrilovic said.
“The most famous example of this being applied is the British military accurately estimating German tank production during WW2 because the tank serial numbers were incremented by 1.”
To determine how many tanks the Germans were producing, the Allies wrote down the serial number of passing tanks and applied an algorithm - take the biggest number, add one to it, divide it, multiply it by a sample size.
"It turns out you get a really good estimate of how many tanks there are," Cubrilovic said.
"The same applies for web applications. All you have to do to find out how many users they have is create a user account, get the user ID, plug it into the German tank algorithm and come back with an accurate measure."
Traders can use techniques like this as well as namespace, user name and email sampling, and sweeps of net ranges to find live hosts that suggest the number of active servers and therefore the size of the application, he said.