When radiotelescopes in the Square Kilometre Array (SKA) begin producing exabytes of data in 2024, 99 percent of that raw data will be deleted.
The loss of potentially valuable information is “unfortunate”, but according to CSIRO scientist Tim Cornwell, unavoidable due to the cost of data storage.
Instead of storing exabytes of new data each week, raw data from the SKA will be processed by a supercomputer into images and time-series information that will be stored and used by researchers.
This will reduce the project’s storage requirements to tens of petabytes a week – equivalent to the total capacity of some of today’s largest data centres.
“We will only store data products,” said Cornwell, the computing project lead for the Australian Square Kilometre Array Pathfinder (ASKAP), a 36-dish array that will launch in 2013.
“There’s too much raw data so [researchers] don’t get access to that. It’s unfortunate, but the cost-benefit is very much in favour of throwing [raw data] away.”
Cornwell’s team at the CSIRO is developing software to produce scientific images and data about the strengths, frequencies and locations of sources of radiation in real time.
For the ASKAP, CSIRO’s algorithms will process some 2.4 gigabytes of data a second and require 100 teraFLOPS of computing grunt from the purpose-built, petascale Pawsey Centre.
Phase one of the SKA will deliver about a tenth of the array’s full capacity by 2019 and require a 100 petaFLOP machine.
By 2024, the full, 3000-dish SKA will require an exascale supercomputer – more than a hundred times faster than today’s most powerful machines.
Cornwell noted that the major hurdle will not be FLOPS (floating operations per second) but ‘memory bandwidth’, which is how quickly data is transmitted into and out of computing cores.
To that end, CSIRO aims to build algorithms that utilise many cores in parallel.
Power consumption is likely to be another major challenge, with existing eight-petaflop machines drawing almost ten megawatts of electricity at a cost of about $10 million a year.
“The key roadblocks are things like power consumption on the multicore chips as you build up heat, and how you move data around,” Australian bid director Brian Boyle said.
“There are roadblocks to be gotten past, but the computing industry is bullish about it ... If we extrapolate from our progress, we don’t see a reason why we wouldn’t get to SKA data rates.
“IBM has told us that exascale computing will be around in 2018, and it will be a Thursday,” he joked.
Besides IBM, the anzSKA team responsible for Australia and New Zealand’s bid to host the array has been working with HP, Cray and Silicon Graphics on potential supercomputing technology.
The anzSKA industry consortium also includes Cisco, Sun Microsystems, the Federal Department of Innovation, CSIRO and supercomputing consortium iVEC.
Many of those vendors also have separate teams working with the competing Southern Africa bid. International SKA organisers will decide between Australia and South Africa in February.
Peter Quinn, director of the International Centre for Radio Astronomy Research, estimated that a third of the SKA funds could be spent on ICT, generating a $670 million business opportunity to technology vendors.
IBM Australia’s research and development director Glenn Wightwick said it began working with the anzSKA consortium five years ago, and had about eight staff in Australia and an equal number elsewhere working on the project.
“We find the problem very compelling,” said Wightwick, noting that existing data management technology like IBM's General Parallel File System were unlikely to scale to meet the SKA’s demands.
“What you would do is access all survey data and search for objects of interest and changes. The constant comparison implies enormous computing requirements.
“You’ll probably want to have processing elements embedded into the data flow, but there are also operations akin to data mining.”
Wightwick hoped an Australia-based SKA would inspire more people to pursue careers in science, technology, engineering and mathematics.
But should Australia be unsuccessful in bidding to host the SKA, “it’s not like work gets wasted”, he said.
Boyle agreed, estimating there to be a “seven-year lag” between the development of cutting-edge supercomputers for research and the use of such machines in high-end enterprises.