More than one petabyte of data stored in database technologies like MongoDB is exposed to the public internet due to end user companies failing to adequately secure their big data systems.
Security firm Binaryedge on Friday said it had discovered many organisations using big data technologies were not employing basic suggested security controls.
It meant more than one petabyte (one thousand terabytes) of data was available on the open web to those with the technical nous to know where to look for it, the firm wrote.
Binaryedge said it discovered almost 200,000 big data systems that were publicly addressable. Such systems were found running on a wide range of companies, from start-ups to Fortune 500 businesses.
Many of those identified were running out of date software and were not using basic security protections like user authentication, the firm said.
As previously discovered by Shodan hacker John Matherly, Binaryedge said there were 39,134 publicly addressable MongoDB servers which had no authentication at all. A further 7267 required authentication, but were similarly publicly addressable.
The 46,401 exposed MongoDB servers held 618 terabytes of data, which was stored in databases named 'local' and 'admin' among others, the firm wrote.
Binaryedge found 35,330 publicly addressable instances of the open source Redis key-value store and cache technology, which similarly lacked any authentication for access. Those systems held around 13 terabytes of data stored in memory, the firm said.
Around 118,574 exposed instances of the Memcached memory caching system containing around 11 terabytes of data were also discovered, as were 8990 exposed instances of the ElasticSearch search engine, which held 531 terabytes of data.
Binaryedge did not interrogate the exposed systems it discovered so did not provide detail on the types of data left vulnerable.
However, it said that since there was more than one petabyte of data publicly accessible - especially some in systems which hold constantly changing data - attackers would be able to access a continuous stream of new information.
At the heart of the issue is the fact that many companies deploying these big data technologies are still figuring out how to use them and are not securing them by default, Binaryedge wrote.
They fail to recognise that the technologies are only meant to be deployed in secure environments and accessed from secure clients, the firm said. Such guidance is laid out in documentation accompanying the software, but Binaryedge noted users are ignoring the advice.
"These technologies' default settings tend to have no configuration for authentication, encryption, authorisation or any other type of security controls that we take for granted. Some of them don't even have a built-in access control," Binaryedge wrote.
"[For example] Redis default configuration doesn't set any type of authentication and listens on all network interfaces as stated on the configuration file."
Already, Binaryedge said, an unknown actor has been connecting to MongoDB servers and creating databases named "DELETED_BECAUSE_YOU_DIDNT_PASSWORD_PROTECT_YOUR_MONGODB".
It said it found 347 different IPs containing that name.
It said it was setting up an automated system to alert companies of open technologies in their networks.