Griffith University will retire its high-performance computer (HPC) hardware when it falls outside warranty, shifting the workloads to run on a mix of offsite HPC clusters and cloud services.
The project is expected to take 18 months to complete and forms part of a broader university-wide cloud and managed services push.
Currently, the Southeast Queensland university operates an on-premises SGI HPC cluster called Gowonda, alongside an older Sun system called v20z.
Both will be out of warranty in 18 months' time, and Griffith University eResearch services and scholarly application development (eRSAD) director Malcolm Wolski said the university would not replace them.
Instead, it plans to migrate workloads that presently run on them to QCIF’s Euramoo cluster, NCI and the Nectar research cloud within the next six months.
The university is also looking to allow researchers to burst onto the Amazon public cloud within 18 months.
“We're moving from a monolithic piece of infrastructure we keep in our own data centre as a HPC to a mixed-service model utilising commercial services and some other more specialised infrastructure,” Wolski said.
"Our work to date has shown that generally 80 percent of job types we could probably move to the cloud tomorrow.
"The remaining 20 percent of job types might need a bit work - for example, in the genomics space where you need high-speed data transfer between storage and the nodes. That infrastructure tends to be a bit difficult to find in the cloud services.”
eRSAD support services manager Andrew Bowness said that having Amazon cloud services as an option could help the university combat large peaks in demand for compute resources.
"When we looked back at our traditional HPC stats, every once in we get these massive spikes of single-core jobs where someone is running up 1000 instances of the same kind of job that runs for a day on a machine," Bowness said.
"That sort of task works really well on Amazon."
However, Bowness noted that cost would "come into the equation" with any use of Amazon instances for research purposes.
The push to expand high-performance compute options is part of a broader university-wide cloud strategy called Project Iris that has already seen the university’s three on premise data centres reduced to two, with the aim of eventually only having a single data centre on campus.
“Historically a lot of our discussions here around HPC have focused around how we keep up with user demand and keep the queues running, whereas in the past year or two we’ve turned the conversation around and asked ‘is there a better way of doing this that is more cost effective?',” Wolski said.
“There will still be a drive to get more users onto HPC, but it’s all about research benefit rather than having a million dollars’ worth of hardware sitting in a room on the Gold Coast.”
Cloud software licensing
According to Wolski and Bowness, the project faces challenges including reconceptualising the HPC service model, and re-evaluating licencing agreements to find out if they allow the university to run various applications in the cloud.
“We've got quite a bit of software that we run locally on our infrastructure here, and we haven't gone through our whole back catalogue and said 'well, what can we run in Amazon?'” Bowness said.
"In some cases, if we have software-based networking, they'd consider that on-premises for licensing purposes, and it's not an issue. But not all vendors' licences do,” Bowness said.
The eRSAD team is looking to deploy Nimrod as a scheduling tool with a web-based front end to obscure the differences between research clouds.
Nimrod was developed by universities to enable researchers to submit jobs, regardless of the infrastructure that would handle them.