The Queensland Cyber Infrastructure Foundation (QCIF), which provides researchers with cloud storage and high performance computing, has revealed plans for an extensive overhaul of its infrastructure.
The Foundation already tacked on an additional 1500 compute cores in mid-2016, just 18 months after it was first built to deal with the increased demand driven from genomics researchers.
This time, the four-year QRIScloud upgrade will address storage and long-term backups, because its users create a lot of data.
“QBI [the Queensland Brain Institute] has brain scanners that produce 250 TB* for one brain scan,” QCIF service delivery manager Stephen Bird told iTnews.
“Do a few of those a day and you fill up a disk pretty quickly.”
Bird said it’s also becoming increasingly normal for researchers from all fields to conduct data-centric research and for the data they generate to be retained for up to 30 years - in triplicate.
“So three copies of the original, plus any daily snapshots that allow us to rollback any changes users may make,” Bird said.
The organisation has therefore devised a scale-out storage plan to increase storage capacity over four years. Bird said the plan will see $30m spent over four years, with falling prices to deliver greater capacity gains in future years.
The overhaul will start with new AMD-powered compute nodes and a terabyte of memory. The new rig will offer 2,000 virtual CPUs, plus shared storage for root and ephemeral drives. This equipment will replace kit nearing end-of-life.
On top of that, an extra 280 TB of storage will be introduced to replace end-of-life equipment.
Bird says the foundation is also looking at re-engineering QRIScloud’s tape storage system, including the addition of 10 petabytes of additional capacity and adding a disk layer into the data migration facility (DMF) to reduce latency.
“We’re looking at a longer-term shift to [data fabric] DMF7 when it becomes available."
“It currently runs on DMF6 which controls the tape libraries we have across two different sites, which covers our replication strategy to protect against media failure.
He added that higher-density tape drives and media are being looked at, “probably LTO-9”.
“LTO-8 would probably give us a 50 percent increase, but we’d rather see something a little bit more substantial to go through the effort of tape migration with 20, 30, 40 petabytes of data to shift.”
The service’s special compute will also get a boost, with QCIF considering at least 1.5 to 2 terabyte large-memory nodes to deal with the large-memory jobs that researchers have.
“At the moment the ability is with a 1 TB node - 2 TB is still fairly expensive but we’re hoping over time that’ll come down,” Bird added.
A fibre-channel service from AARNET will be provisioned to allow QRIScloud’s back-end tape storage to be reconfigured and optimised.
Connecting users to their data
To manage the data movement from QRIScloud to institutions’ on-premises servers, the front-end of the MeDiCI data storage fabric developed by the University of Queensland will be scaled to support the increased in volumes and demand permitted by the expansion.
QCIF will also adjust how collection storage is presented to the HPC clusters to improve performance.
To cap it all off, a 100-gigabit network spine will be added to added to improve the connection between equipment and to support further expansions in the future.
Bird says the demand for increased capacity and better management is already high at QRIScloud.
“We’re seeing growing demand anywhere upwards of 20,000 to 40,000 jobs queued at any point in time, so there’s a lot of demand for that. Thankfully most of those get executed within 24 hours.
“But it just shows that there is demand and that demand will likely increase,” Bird added.
* Yes, 250 TB. We checked.