BitTorrent researchers stand by sponsored study

 

Call for others to replicate.

One of the authors of a contentious study of BitTorrent trackers effectively dismissed a critique of the work by a major torrent blog overnight, challenging the critics to produce some "comparable research".

The Internet Commerce Security Laboratory (ICSL) study, partly paid for by Village Roadshow, found "at least 89.9 percent" of a sample of 1,000 popular torrents infringed copyright.

The study was lampooned by file-sharing blog Torrent Freak who claimed it was riddled with mistakes and that its conclusions were based on "painfully inaccurate data".

ICSL head Paul Watters forwarded what appeared to be part of written correspondence sent directly to the Torrent Freak site.

It thanked Torrent Freak "for your enquiry" and conceded the site "raised some interesting points that are fundamental to the validity of any study in this area: the sampling strategy; verification of results and so on."

"As researchers, we not only stand by the findings that we have arrived at, but - having made our methodology public - we are providing other bona fide researchers to replicate and/or dispute our findings," Watters said.

"Their results can in turn be assessed through the peer review process; this is the process that normal research activity takes.

"We believe that our methodology was rigorously applied to the sample that we obtained. Over time, we will replicate the sampling process, so that we will gain better estimates of the population results.

"This is the fundamental tenet of statistical sampling; I would be happy to send you a complimentary of my O'Reilly "Statistics [in] a Nutshell" book that might give further insight into statistical methodology.

"We look forward to reading the results of any comparable research that you produce!"

Watters didn't appear to have responded directly to Torrent Freak's criticisms of the methodology itself.

The study was released just over a week before the film studio that commissioned it heads back to the Federal Court to appeal an unfavourable decision in its long-running case against ISP iiNet.

The film industry wants to make iiNet and other ISPs responsible for the copyright infringing actions of internet users on their networks.


BitTorrent researchers stand by sponsored study
"@Ace, surely it is unthinkable that the torrent police could be allowed to - oh, just a minute, somebody's pounding on my front door. . ."
By anonymous
 
 
 
Comments: 9
ITrant
Jul 27, 2010 1:28 PM
Without equally funded peer review studies, this report is meaningless. Corporations have long realised they can 'write the history' if they fund a study.

The Torrent Freak author taught statistics and research methods to PhD students and has far greater specialist knowledge. How about Village Roadshow fund that review of the report?
Ezy2Confuze
Jul 27, 2010 2:33 PM
Blind Freddy can see AFACT using this against iiNet.

When Conjob loses his seat in the upcoming election, he should apply for a job at AFACT. It's a perfect fit, neither of them truly knows what they are talking about, both try to pull the wool over the publics eyes at every turn and both of them know how to twist words just the right way to fit their own agendas.
Cham
Jul 27, 2010 2:35 PM
The impression I got from the TorrentFreak wasn't just that they disagree'd with the findings of the study, but that they disagree'd with the methodology as well. It's all well and good to say "our methodology supports our results" but if the methodology is wrong, then the results are wrong too.
Bloemfontein78
Jul 27, 2010 2:52 PM
@Cham - actually read the study. The methodolgy is fairly grounded. Just because Ernesto says so, doesn't make it so.... Independant thought is a good start

@Ezy2Confuse - I thought this has already been covered elsewhere that AFACT can't use it in court?

@ITrant:
Has a far greater specialist knowledge then who? Eeek.



1. Ernesto claims he doesn't know how the researches came to a 1 million + torrent figure. From my reading of the research, it is clearly outlined in the methodology. The fact that open bittorrent list over 2 million torrents is pointless if it doesn't allow a full tracker scrape or the researches were unable to obtain a full scrape. Did Ernesto try to do this (or replicate any part of the study)?

2. That the categorisation process is flawed is utter crap. I'm a post grad in Maths, and a sample set of 15000 is enough to get a statistically sound sample.

Anyways, TorrentFreak misinterprets (doesn't read) all the data from the IsoHunt gospel anyway. In IsoHunt reporting, they state only 127168 torrents are online (http://isohunt.com/stats.php?mode=btSites), so > 10% of the total currently traded torrents is a huge statistical sample set, and one would think, contains books, and all the other things that they claim is under represented. How many people download an eBook on BT anyway?

3. The seed count issue they make at point 2 and 3 is probably the only valid point they (TorrentFreak) make. And the top 100 may be loaded with fakes. That said, TorrentFreak rely solely on IsoHunt data and what it is reporting. Hardly a scientific basis for critical analysis.

4. TorrentFreak/Ernesto deliberately misrepresent the data of the study. Rather than taking the lead out of the Exec Summ from the initial research, they used the upper threshold of the analysis (whereas every other news agency mention the 89%). The report erred on the side of caution when they state 89% of BT was infringing, up to 97.9%, but there was a large unknown. TorrentFreak clearly uses the high figure to make their point skewing the interpretation of the data. If TF want to seriously critically analyse the data, they should be referring to the complete subset of data - not cherry picking.

The fact Ernesto taught stats once upon a time makes no difference. He is deliberately ignoring or skewing data in initial research paper for his own (Read: his/TorrentFreak/file sharers) purpose.

Actually read the research and attempt to understand it before criticising it based on what the blogosphere claims.
torrentfreak
Jul 27, 2010 6:47 PM
@Bloemfontein78

1. Fair point. Personally I found the mention of 1 million torrent misleading in reply to the question of how many torrents are shared. Also, they estimate that this figure would increase at a lower rate with more trackers being added, but this obviously not the case.

2. The categorization process IS flawed (based on a biased sample). The sample is taken from the most seeded torrents and not all the torrents they found through their scape. Because of this some categories are overrepresented, and others under (e.g. movies have more seeders than books generally). The size of the sample is irrelevant if your selection method (most peers) is biased.

How many people download ebooks? http://www.kickasstorrents.com/browse/

3. TorrentFreak is not relying on isoHunt, that was merely used as an example. We actually have a few machines dedicated to tracking all torrents and downloads for our weekly charts. I'd like to think that the system we built and optimized in the last 3 years is pretty good.

4. Ernesto's not misrepresenting anything. The authors do state that only 0.3% was confirmed legal.

The researchers should be ashamed of themselves for posting this weak and misleading report.

Bloemfontein78
Jul 27, 2010 9:28 PM
@torrentfreak - thanks for the reply.

1. Without further research, and given the ISOHunt bible states there is little over 100,000 torrents being actively seeded (shared-much less than the 5 million tracked) - see summary @ http://isohunt.com/stats.php?mode=btSites, the study is correct if you take at least one seeder as being a prerequisite for the file to be shared. So yes, there would be a law of diminishing returns.

2. See point above that if > 1 seed is required to be counted as a seeder. 15000 is 15% of the population which is huge. If you look at the distribution, I would personally hypohesize that the distribution would stay roughly the same, although without conducting my own study, I cannot confirm. The researches could try a random sample also in the next iteration of their study. Or TF with their optimized system could produce some information?

The link you post to browse kickass torrents doesn't give any indication to torrents being actively seeded.... This is something for further exploration - the bias with a 15% of the population sampled is minimal at best.

3. Fair call, although that is not clear from any information which was actually posted on your site. However, I would suggest such a system which you possess could be used in order to produce a comparable survey.

4. Actually, he does:
"Here the researchers conclude that 97.9% of all files on BitTorrent are copyright infringing, and only 0.3% confirmed ‘legal"

Incorrect. The study claims 97.9% when the ambiguous titles were non infringing and porn was not included. So he's presenting two separate sets of the result as one coherent result. The 97.9% section was set when porn was not included and the 16 ambiguous titles were considered non-infringing thereby allowing 2.1% of non-infringing. The 0.3% was when considering the entire population sample including porn and leaving ambiguous titles as such

There is further research to be done, but certainly the researches were right to stand by the study. The methodology was open, and plain to see. TF/Ernesto jumped the gun and misrepresented the data. Princeton University claimed 99%+ was infringing without publishing any methodology yet didn't come in for nearly the same criticism.

In most instances, the published study erred on the side of caution (with the exception of seed counts), so perhaps TF should show more respect to the academic process rather than merely paying lip service, misrepresenting the data presented and not providing any real proof beyond citing IsoHunts stats.

To call this ' one of the most inaccurate reports we’ve seen thus far' is by far an exaggeration given the open way it was conducted and the results published.

It is TorrentFreak and Ernesto who evidently taught stats who should be ashamed for providing a misconstrued representation of the study as facts.
david.price
Jul 28, 2010 8:15 AM
I've been responsible for a more extensive study along these lines, examining the 10,000 most popular torrents tracked by OpenBitTorrent. We hope to publish soon but a brief breakdown is below.

I'm not here to enter the methodology arguments that surround the ICSL study outlined above. I think there may be errors in there, particularly in error checking some of the torrent swarms found to be most popular (for instance, The Incredible Hulk, a modestly popular film released in 2008, was the most seeded file on bittorrent in April 2010 with 1 *million* seeds?). But the main point to make is that the research we have produced found a significant amount of copyrighted content as well, across a sample of torrents ten times greater.

To summarise our study: we took a full scrape of OpenBT (the largest tracker when we scraped at about 1.9m unique infohashes). We then sorted these infohashes by number of *leechers*, not seeds (one reason for this is that fake files / malware are most often promoted as having many, many seeds in order to attract downloaders). We then took the top 10,000 torrents as ordered by leechers and:
i. checked infohashes against two portals
ii. weeded out fakes (a largely manual process)
iii. checked all infohashes up to the 2,000th most popular swarm against google. In the end, all files up to the 2,000th most popular infohash were identified.

That was a fairly long process, as you might imagine. We then categorised, analysed, etc.

So of the top 10,000:
+ 15.1% was identified as pornography
+ 25.3% of infohashes could not be identified by checks against the two portals and in some cases, Google. Subsequent analysis has led us to think that a good deal of this content is pornography (and a large amount of it appears to be found mostly on Asian forums - remember that EZTV research that found Xunlei the most used torrent client? Looks like they could all be after pornography).
+ 283 torrents were classified as fake

So after that we were left with 5,677 torrents which were non-pornography and identified. Of all these 5,677 infohashes, *only one* was non-copyrighted content - a Linux distribution.

As such, *at least* 56.76% of the content we found in the most popular 10,000 files on bittorrent was copyrighted and being shared illegitimately. I *could* say "99% of the infohashes we identified and which were not pornography were copyrighted" but I see where that kind of statement got the authors of the study above, so I won't.

So to break down the top 10,000:
+ 25.3% was unknown
+ 15.1% was identified pornography, copyrighted status unknown and unexplored
+ 27.8% was films, all identified, all copyrighted
+ 14.8% was television (we sampled on a Tuesday morning from what I remember and Monday night's episode of House was the most in-demand single infohash), all identified, all copyrighted
+ 7.8% was games, all identified, all copyrighted
+ 4.1% was music, all identified, all copyrighted
The rest was a smattering of software, books, comics, a few sports broadcasts (UFC is very popular).

A few last stats:
+ the top 10,000 torrents sorted by leecher comprised 35.5% of all peers (seeds and leechers together). Top 10,000 torrents = only 0.54% of total torrents tracked by OpenBT.
+ the top 10,000 torrents represented 44.8% of all leechers - that is to say, nearly half of all active downloaders were only interested in 0.54% of the content.
+ over half of all infohashes tracked by OpenBT had no downloaders at all at the point of scrape

As I say, we hope to publish all this in more detail soon - with a full methodology...
Ace
Jul 28, 2010 12:11 PM
Nice work people. However, are ISPs responsible for the software people run on their PCs at home? Can they legitimately snoop on traffic. If torrents are encrypted, would that even be possible?

There is clearly a problem, but short of raiding peoples houses and inspecting their PCs, it is difficult to see what could be done about it.
anonymous
Jul 28, 2010 1:37 PM

@Ace, surely it is unthinkable that the torrent police could be allowed to - oh, just a minute, somebody's pounding on my front door. . .
Comments have been disabled for this article.
 
 
Top Stories
NBN Co could miss revised June fibre targets
Analysis: Cutting it fine in the race to the line.
 
Review: Sydney's Opal smartcard
It's no Oyster card.
 
Rackspace puts price premium on Aussie public cloud
At least 17 percent more compared to US instances.
 
 
Sign up to receive iTnews email bulletins
   FOLLOW US...

Latest VideosSee all videos »

iTnews Academy: Microsoft Windows Server 2012 - Hyper-V
iTnews Academy: Microsoft Windows Server 2012 - Hyper-V
Interview: Australia's 'cloud-last' policy is dangerous.
Interview: Australia's 'cloud-last' policy is dangerous.
Interview: Vivek Kundra on Australia's 'cloud last' policy
Bankwest builds continuous delivery capability
Bankwest builds continuous delivery capability
To automatically deploy test/dev sandboxes by mid-year.
Veterans' Affairs sets sights on modernisation
Veterans' Affairs sets sights on modernisation
Data safe with Human Services, CIO says.
Citi Australia drops platform customisations
Citi Australia drops platform customisations
Technology chief shifts focus from building to leveraging systems.
VicRoads restructures IT team
VicRoads restructures IT team
Department moves to align with industry benchmarks.
Zurich Australia extends IT team offshore
Zurich Australia extends IT team offshore
Malaysian staff served from Australian data centres.
Leigh Berrell - Utilities CIO of the Year
Leigh Berrell - Utilities CIO of the Year
Yarra Valley Water CIO Leigh Berrell accepts his Benchmark Award for Utilities CIO of the Year.
Wayne McMahon - Retail CIO of the Year
Wayne McMahon - Retail CIO of the Year
Domino's Pizza CIO Wayne McMahon accepts his Benchmark Award for Retail CIO of the Year.
Inside Perpetual's ongoing IT transformation
Inside Perpetual's ongoing IT transformation
CIO Jenny Levy discusses how outsourcing will help the firm "simplify, refocus and grow".
Managing Complexity - Defence's Daniel McCabe
Managing Complexity - Defence's Daniel McCabe
Daniel McCabe, Assistant Secretary of Australia's Department of Defence, provides the audience at the iTnews Data Centre Strategy Summit with a deep dive into the organisation's data centre consolidation program.
How Facebook designed the data centre from scratch - Marco Magarelli
How Facebook designed the data centre from scratch - Marco Magarelli
The full keynote by Facebook data centre architect Marco Magarelli at the Australian Data Centre Strategy Summit. Magarelli details the design considerations behind the social network's Prineville, Oregon; North Carolina and Luleå, Sweden data centres.
Modernising Legacy Data Centres - Telstra's Jon Curry
Modernising Legacy Data Centres - Telstra's Jon Curry
Telstra general manager of managed data centres Jon Curry guides the audience at the iTnews Australian Data Centre Summit through the build of the telco's Clayton, Victoria data centre.
NSW Government launches NABERS data centre rating tools
NSW Government launches NABERS data centre rating tools
Matthew Clark from the NSW Department of Environment guides facilties managers through the details of the new NABERS data centre energy rating tool at the Australian Data Centre Strategy Summit.
NABERS launch panel: Australian Data Centre Strategy Summit
NABERS launch panel: Australian Data Centre Strategy Summit
Matthew Clark (NSW Dept of Environment), Greg Boorer (Canberra Data Centres), Glenn Allan (National Australia Bank), Mike Andrea (Strategic Directions) and Bob Sharon (Green Global Consulting) discuss the impact of the NABERS data centre rating.
Judges notes: Fortescue Metals [The Benchmark Awards]
Judges notes: Fortescue Metals [The Benchmark Awards]
iTnews' panel of judges discuss Fortescue Metals 'New World of Work" project, one of three shortlisted finalists for the Industrials category of the CIO Benchmark Awards.
Judges notes: Retail [The Benchmark Awards]
Judges notes: Retail [The Benchmark Awards]
iTnews' panel of judges discuss the shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: Pacific Aluminium [The Benchmark Awards]
Judges notes: Pacific Aluminium [The Benchmark Awards]
iTnews' panel of judges discuss Pacific Aluminium's lightning fast service desk refresh, one of three shortlisted finalists for the Industrials category of the CIO Benchmark Awards.
Judges notes: Domino's Pizza [The Benchmark Awards]
Judges notes: Domino's Pizza [The Benchmark Awards]
iTnews' panel of judges discuss Domino's Pizza's shift to hosted services, one of three shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: McDonald's Australia [The Benchmark Awards]
Judges notes: McDonald's Australia [The Benchmark Awards]
iTnews' panel of judges discuss McDonald's Australia's new self-service portal for employees, one of three shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Latest articles on BIT Latest Articles from BIT
How to use Microsoft OneNote to organise your minutes, memos and more
Jun 18, 2013
You might already have OneNote, but you might have never used it. Here's how to use it to ...
Microsoft’s new Office Mobile app for iPhone looks handy, but there’s a catch
Jun 17, 2013
Click here to see what the biggest hurdle to using Microsoft's just-announced Office Mobile app ...
A handy app for finding the cheapest parking
Jun 14, 2013
This app takes the hassle and the cost out of finding a car park in the city. It is available on ...
Small business rallying cry continues before election
Jun 13, 2013
Hate paperwork? Find taxes too complicated? Then the organisers of this nation-wide petition ...
I want to save money: can I spend less on Microsoft Office?
Jun 11, 2013
Can't afford Microsoft Office? Here is a basic introduction to some options if you're looking to ...
Latest Comments
Polls
Will you quit any cloud services in light of PRISM?

   |   View results
Yes
  67%
 
No
  33%
TOTAL VOTES: 55

Vote