The US Department of Defense is the latest high-profile entity to leak vast amounts of information via unprotected Amazon Web Services cloud storage.
Security vendor UpGuard found three large information repositories left wide open in the AWS S3 cloud with buckets containing online content scraped from sites over the past eight years.
The data comprises some 1.8 billion posts from the comments sections on news sites, web forums, and social networks by individuals in diffierent countries and in multiple languages.
UpGuard decompressed the large files it found in the S3 data buckets and discovered they contained indices for the Lucene application, which is optimised to interact with the AWS ElasticSearch engine.
It appears the data was collected in bulk by the US CENTCOM and PACOM Central and Pacific military commands that co-ordinate operations in the Middle East, Asia and South Pacific, UpGuard said.
Examining the data, UpGuard found "loose correlations" to regional US security concerns in the likes of Iraq and Pakistan. This suggests the information may have been collected for surveillance purposes.
As many of the posts come from within America, UpGuard says the data collection "raises serious concerns about the extent and legality of known Pentagon surveillance against US citizens".
UpGuard has documented multiple S3 data leaks in recent months by organisations such as Viacom, Accenture and Verizon.
Just last week the Australian Broadcasting Corporation exposed sensitive user data through the same S3 misconfiguration.