In the next decade, astronomers expect to be processing 10 million gigabytes of data every hour from the Square Kilometre Array telescope.
Now with DNA sequencing getting cheaper, scientists will be data mining possibly hundreds of thousands of personal human genome databases, each of 50 gigabytes.
CSIRO has undertaken new research program aimed at helping science and business cope with masses of data from areas like astronomy, gene sequencing, surveillance, image analysis and climate modelling.
The research program, which began this year, is called 'Terabyte Science' and is named for the data sets that start at terabytes in size, which are now commonplace.
Dr John Taylor, from CSIRO Mathematical and Information Sciences said the CSIRO needs to be able to analyse large volumes of complex, even intermittently available, data from a broad range of scientific fields. But trouble arises in such analysis when methods that often work with small data sets don't necessarily work with large ones.
According to Dr Taylor the aim of the program is to develop completely new mathematical approaches and processes for scientists in a range of disciplines.
"Large and complex data is emerging almost everywhere in science and industry and it will hold back Australian research and business if it isn't dealt with in a timely way," Dr Taylor said.
Countries like the US also recognise the challenges, as Dr Taylor has seen first hand in his ten years working in laboratories there.
"This will need major developments in computer infrastructure and computational tools,” he said.
After a workshop in September, specific research areas have been identified and projects are progressing in advanced manufacturing, high throughput image analysis, modelling ocean biogeochemical cycles, situation analysis and environmental modelling.
CSIRO in fight to keep masses of data to a minimum
By Staff Writers on Nov 12, 2007 2:23PM