Western Australia’s Pawsey supercomputing centre is preparing to boost its storage and high-performance compute capabilities in response to the flood of data being generated by radio telescopes and research projects.
Co-funded by the Commonwealth and WA governments and four universities, the Pawsey Centre supports the 36-antenna Australian Square Kilometre Array Pathfinder (ASKAP) and Murchison Widefield Array (MWA) telescopes.
Pawsey’s head of supercomputing David Schibeci told iTnews the sheer volume of data being generated both by ASKAP and other researchers is placing pressure on the facility’s storage and compute capacity.
“The raw data that comes off the [ASKAP] telescopes is about 400 terabits per second, which is more than we can handle. So as part of the NBN project, there were four 10 gigabit fibre links that were run right down the west coast of Australia into the Pawsey Centre itself,” Schibeci said.
“After doing some pre-processing, they can stream that data directly into Pawsey and be able to process it. At the moment, when all 36 telescopes are operational, they can produce 240 TB of data per day.”
Pawsey’s main workhorse is a Cray XC40 supercomputer called Magnus, which went live as a petascale system in September 2014 and runs 1488 compute nodes for researchers.
The centre also features a Cray XC30 called Galaxy that is dedicated to the radio astronomers, along with a small commodity Linux cluster called Zeus, which primarily supports pre and post-processing applications as well as visualisation.
While Magnus was the most powerful supercomputer in the southern hemisphere when it went live less than two years ago, Schibeci said it is already running at capacity.
“The last time I had a look at the usage report – I believe it was the March usage report – 97 percent of the capacity of Magnus available was used. The number of jobs sitting in there is just crazy at the moment,” Schibeci said.
“Every time we get a larger system, it doesn’t take long for that capacity to be consumed. The last time we went out for a national merit call, researchers asked for three times as many CPU hours as we have on Magnus."
To handle the growing demands, Pawsey is in the early stages of a long-term project dubbed the Advanced Technology Cluster, which will look at implementing an even more powerful high performance compute system.
“One of the things we’re aware of – because our support agreement with Cray runs out in September 2017 and we’re looking to extend that another year – is that by September 2018 we need to look at either expanding or replacing Magnus,” Schibeci said.
“[The ATC project] is looking at some of the technologies that will, perhaps, make whatever the replacement for Magnus is."
In the meantime, however, the Pawsey Centre is woking to add additional storage to the system to tide researchers over until the Advanced Technology Cluster can be built.
It is in the process of installing a version of the open-source Lustre storage system that is supported by Intel, and has recently signed a hardware deal with Dell, thanks to new funding to come out of the National Collaborative Research Infrastructure Strategy.
The new group file system will deliver 1.7 PB of storage to Pawsey's users, and will replace the stop gap measure Pawsey has been relying on up until this point.
According to Schibeci, Pawsey’s original design did not include a group file system for ongoing compute projects, so the team had been forced to repurpose 768 TB from scratch storage – without backups.
"We’ve been self-supporting for the past two years,” he said.
The new group file system will be physically located alongside Galaxy and Magnus and will be connected through a high-speed 72 Gbps interconnect. It will consist primarily of spinning disks.
Schibeci plans to run both files systems in parallel while he synchronises the data, and then make the switch on an upcoming maintenance day.