AI data hunger feeding a shadowy proxy ecosystem

By
Follow google news

Users often unaware of what agreeing to "bandwidth sharing" means.

Artificial intelligence (AI) companies are quietly driving demand for networks of co-opted consumer devices, as they seek to avoid blocking while scraping fresh data to train their models, recent research suggests.

AI data hunger feeding a shadowy proxy ecosystem

The research comes after Google's Threat Intelligence Group (GTIG) put the problem in sharp relief last month when it kneecapped the infrastructure of IPIDEA, a Chinese provider described as one of the largest residential proxy networks in the world.

A residential proxy routes internet traffic through a real home or small business internet connection.

This has the effect of making data requests appear to originate from an ordinary household rather than a corporate server, which in turns makes it seem like human-generated traffic.

Unlike a virtual private network (VPN), which encrypts a user's own traffic across two endpoints, a residential proxy turns a consumer's device into a network exit node for someone else's traffic entirely.

Proxy network operators build these pools of home connections by paying app developers to embed proxy software development kits (SDKs) into otherwise unrelated applications, or by distributing standalone apps that promise users they can "monetise" their unused bandwidth.

An Israeli company, Bright Data, was previously reported to have pitched streaming TV operators with integrating an SDK into their apps, and opting users into monetising schemes.

In practice, most users have little idea what exactly their device is doing if it becomes part of a proxy network, including if it's being used for data scraping.

"Some free VPN providers include this bandwidth sharing in their terms of service," Maynard Koch, the chair of distributed and networked systems at Germany's Technische Universität, Dresden, told iTnews.

"If you use such a "free" service, you basically install proxy software on your device and become part of the proxy network," Koch added.

As users have agreed to the terms of service to use the software, providers can then claim that their proxies are "ethically sourced," Koch said.

AI drives demand

The commercial appeal of residential proxies has grown significantly as AI companies have begun using data scraping bots to harvest freshly published web content for model training.

Site operators have responded by blocking access from known corporate and data centre IP ranges, which has pushed some AI companies towards residential proxies, in the belief that traffic from these does not trigger detection.

Proxy providers have been quick to pivot their marketing accordingly.

US-based Olostep markets a data crawling service that is aimed at AI companies, offering monthly plans costing between US$9 to US$399, depending on the number of requests included.

Separately the company offers developers an SDK called Mellowtel that can be embedded in apps to monetise users' bandwidth, in effect turning those users' connections into proxy nodes, often without clear disclosure of the purpose.

"Yes, AI definitely seems to be driving demand for residential proxies," Ben Brundage, founder of proxy tracking company Synthient, told iTNews.

The growth in demand, however, has not been matched by growth in legitimate supply.

"We also largely attribute the shrinking costs to unethically sourced proxy networks that rely on botnets such as IPIDEA," Brundage added.

Major AI vendor Perplexity.ai told iTNews last year that it does not use Olostep or Mellowtel.

However, Perplexity.ai and Internet infrastructure vendor Cloudflare were engaged in a very public stoush last year over alleged AI bot evasion by the former, which it denied.

The risk to users

For end users, allowing a device to become a proxy exit node is potentially riskier than providers' marketing implies.

GTIG's analysis found that IPIDEA's SDK did not merely route external traffic through a device.

It also sent traffic back to it, creating a pathway for attackers to reach other devices on the same home or corporate network.

The Kimwolf botnet separately exploited a vulnerability in IPIDEA's infrastructure to tunnel back through the proxy network and compromise local network devices.

IPIDEA's network was also used for distributed denial-of-service attacks and as command-and-control infrastructure for malware, including the BadBox 2.0 and Aisuru botnets.

The Chinese company has denied being the operator or controller of the BadBox 2.0 botnet, and said it acted quickly to close Kimwolf's access after receiving a vulnerability report from Synthient in late December 2025.

Proxy numbers overstated

Before GTIG's action took IPIDEA's website offline, the company advertised nearly 900,000 proxy servers available in Australia and almost 150,000 in New Zealand, iTNews observed.

Those figures are, to put it charitably, overstated.

"Typically proxy providers inflate their actual numbers for buyers," Brundage said.

"In the last seven days, across New Zealand and Australia alone, we've seen 190,052 distinct IP addresses from all residential proxy providers," he added.

"When isolating to just IPIDEA we've seen 50,902 unique IPs in those last seven days."

The real numbers are still substantial for a relatively small market such as Australia and New Zealand.

Takedowns don't clear proxy pool

Ben Dowling, co-chief executive of IP address intelligence firm IPInfo, provided data to iTNews suggesting that brand-level enforcement action like the IPIDEA disruption does not remove the underlying devices from circulation.

Analysis of IPIDEA-affiliated providers shows a 74 to 88 percent IP address overlap with at least 11 other residential proxy services.

Key providers including 2captcha were at 87.9 percent overlap, sparkproxies at 87.69 percent, and mangoproxy at 85.31 percent.

Across all residential proxy networks, 46 percent of IP addresses appear simultaneously in multiple provider pools, with individual IPs observed across as many as 101 different provider pools at once.

In a 90-day period across just six tracked providers, IPInfo identified nearly 79 million unique IPs globally.

"A takedown focused on a specific brand does not clear the IP addresses from the market," Dowling said.

"Those same devices continue to be monetised by sister providers."

That resilience was on display almost immediately after GTIG's announcement on January 29.

Active IP counts for linked providers 922proxy and PyProxy cratered by 99.9 percent and 99.87 percent respectively, and their DNS configurations were removed on February4 .

Despite that, their backend servers remained operational: by connecting directly to server IP addresses rather than hostnames, the network resumed data collection, indicating the operators were shifting methods rather than shutting down.

A common operator in background suspected

Technical analysis of the backend infrastructure points to a concentration of control.

Five of the providers named by Google - 922Proxy, PyProxy, ABC Proxy, PIA S5 Proxy, and Luna Proxy - all use the same network operator for their backends.

IPInfo says this shared infrastructure strongly suggests a common operator managing multiple brands behind the scenes. 

Residential proxies to evade bot detection unlikely to work

One attraction of using residential proxies is that the traffic emanating from these may be more likely to be deemed as human-generated, and not blocked as bot-made requests.

Cloudflare, which partnered with GTIG on the IPIDEA disruption and operates a bot detection service, said the residential IP angle is less decisive than proxy providers claim.

"There's a lot of signal, other than just the IP address, that reveals the difference between human and non-human traffic," Cloudflare spokesperson Daniella Valrupalli said.

"Cloudflare is able to look at millions of characteristics of crawling behaviour in order to determine what is bot and what is human traffic, regardless of what IP addresses, residential or otherwise, the traffic is coming from," she added.

Got a news tip for our journalists? Share it with us anonymously here.
Copyright © iTnews.com.au . All rights reserved.
Tags:

Most Read Articles

Australia's big end of town is paying ransomware groups

Australia's big end of town is paying ransomware groups

Services Australia describes fraud, debt-related machine learning use cases

Services Australia describes fraud, debt-related machine learning use cases

Seven years' prison for Australian who sold zero-days to Russia

Seven years' prison for Australian who sold zero-days to Russia

AI can unmask online users for just a few dollars each

AI can unmask online users for just a few dollars each

Log In

  |  Forgot your password?