ABS hobbles census data downloaders

By

DVDs charged at $250 a pop.

The Australian Bureau of Statistics has released the latest census data for free under a Creative Commons license but appears to be steering people towards a $250 mailed out DVD rather than making it easy to download the information directly over the internet.

ABS hobbles census data downloaders

Programmer and freelance journalist Grahame Bowland who first noticed it, said the government agency is going to great lengths to discourage people from downloading the files directly by dint of  a convoluted site layout and Javascript functions that obfuscate file paths.

Bowland discovered how difficult it was to download the data packs for free when he tried to do it, with the ABS forcing him to jump through several hoops before accessing the data.

First, people have to register on a hard-to-find page, Bowland said, after which they are redirected to another page with a big matrix of data packs.

ABS census datapacks download page

"You have to click to download each pack individually, and they've set the site up deliberately to make it difficult to use a browser plugin to download everything that is contained on the released DVD image," Bowland told iTNews.

Javascript that obfuscates the file paths has been added to the site by the ABS developers, including comments. Bowland provided the below commented code sample directly from the agency site to iTNews:

// Function: guidGenerator
// Description:returns a pseudo-random GUID
//This is appended to a url for 2 reasons
//1. to make the URL unique, so that the browser always gets it and doesn't use a cached version
//2. to make a URL look like its got a unique key, in a naive attempt to fool a not-so-wily hacker
//into thinking they can't download a datapack directly if they know the URL pattern, because they
//need a unique key.
 

"The ABS is trying to obfuscate paths to make it hard for people to bulk download the data, and labelling people who want to 'hackers'," Bowland said.

Bowland said on closer inspection of the code it appears you don't actually have to register at all to get the data packs, as you can use a Javascript code snippet to generate the URLs of all the data packs.

To prevent this, the ABS Javascript coders have documented what they call a "pathetic attempt" at stopping the direct data pack downloads by using a randomly generated number appended to the URLs so as to make them appear as if a complex key is required as per below:

// Function: getZip
// Parameters:fileName - the file to be downloaded.
//There are basically 2 formats, normal DataPacks and boundary files. For example:
//* 2010_BCP_SA1_for_Vic_short-header.zip
//* 2010_SA4_POW_shape.zip  or without the POW eg. 2010_SA4_shape.zip
// Descrition:
// This function is ultimately fired when a user clicks on a download link on the DataPacks download page.
// It does 4 things:
//
// Step 1. Get some dynamic values from the page. These were substituted when the page was created.
//'dpserver' is the full domino path, as seen in the dominoserver.properties value DataPacks.DominoServerExt
//  eg. http://www.censusdata.idev.abs.gov.au/CensusOutput/copsubdatapacks.nsf/All%20docs%20by%20catNo,
//      Also, generate a random number, which we append to the URL, to make it appear as if a complex
//key is required. This is a pathetic attempt to discourage someone from downloading the ZIPs
//directly (ie. without having to login), if they deduce the URL pattern.
//It's also used to make every URL unique, so that the browser always sends the request to the server,
//(ie. doesn't use its cached version), because we want to know about every click.

Despite having got this far, Bowland noted that the some additional geometry files for DVD 3 couldn't be found on the ABS website, so he decided to stump up $250 for all the releases to be mailed to him.

According to Bowland, the high cost set to post out an optical disc means "in reality, they're subsidising internal admin roles by selling DVDs".

For those who don't wish to tangle with obfuscating Javascript or pay hefty charges for DVDs, Bowland has made the census data available for download via Bittorrent on his website.

ABS responds

A spokesperson for the ABS said the $250 charge for the DVDs was to "recover administration costs" but pointed out that this was the first time its census data had been made free and available to everyone via its website.

According to the spokesperson, the ABS has worked hard to reduce the costs since 2006, when similar datapacks cost $805.

As for the convoluted download site layout with registration and obfuscated file paths, the spokesperson said there was room for improvement.

"The ABS is constantly looking at ways it can simplify the website and enhance the user experience," iTnews was told via email.

"We will shortly be conducting a review of all census products and services and will engage users of census data to better understand their needs," the spokesperson added.

Got a news tip for our journalists? Share it with us anonymously here.
Copyright © iTnews.com.au . All rights reserved.
Tags:

Most Read Articles

Vic firefighters doing battle with IT outages

Vic firefighters doing battle with IT outages

Transport for NSW restructures tech division

Transport for NSW restructures tech division

CSC to buy UXC for $428m

CSC to buy UXC for $428m

Fed's digital ID system coming to myGov "this [financial] year"

Fed's digital ID system coming to myGov "this [financial] year"

Log In

  |  Forgot your password?