The University of Southern Queensland is gearing up for the launch of a new supercomputer it hopes will broaden the range of faculties and number of graduates making use of high-performance computing.
The SGI system, which will go live in the coming weeks, features 580 cores across 29 nodes, including 24 standard compute, one GPU and four large memory nodes. It runs Red Hat Linux along with a range of SGI software including compilers and the PBS Pro job scheduling system.
USQ executive director for IT services Scott Sorley told iTnews that unlike Monash University’s MASSIVE-3 HPC system, which was launched in February, USQ has not opted for a GPU-based supercomputer.
“The majority of our HPC work requires plain vanilla compute capacity. We have got a few GPU nodes in there to try to encourage our users to develop into that space,” Sorley said.
“When we implemented our old HPC back in 2009 I enquired broadly whether our researchers were interested in GPU processing and at that point there was no demand. There was a little more interest in it last year, so we’ve got a GPU node, but the majority of its use is as standard x86 compute machine."
The IT team first installed equipment for the new supercomputer last October. Users started testing the system about two months later as the university began the time-consuming task of testing and validating software.
“We’ve installed it in a data centre at our Toowoomba campus. When we installed our previous HPC system in 2009, it was put in a brand-new data centre,” Sorley said.
“We designed and built that data centre with the expectation that it would include HPC, so we built it with the ability to run 30 kilowatts per rack, lots of power and in-rack cooling, so we already had a space allocated for the next iteration of our HPC."
The yet-to-be-named HPC, which uses Intel processors, replaces an older AMD-powered HPC system the university acquired in 2009.
“One of the reasons why we changed [processor vendors] is because it seemed to offer the biggest bang for the buck in terms of throughput. We’re also looking for some performance improvements on some of our code by moving to using Intel-based compilers on Intel chips,” Sorley said.
Following the soft go-live, the university is now in the process of finalising the name of the computer ahead of the official launch.
“USQ’s logo is of a phoenix, so it was a great irony when I heard that the University of Adelaide was calling their new machine Phoenix. We’re running an internal naming competition, and once we’ve chosen the name, we expect to launch in the next couple of weeks.”
Complementing state and national capabilities
The HPC will complement the compute capabilities USQ researchers have access to through the Queensland Cyber Infrastructure Foundation (QCIF), as well as national bodies such as Nectar.
The university opted to augment available state and national compute capabilities with an on-campus supercomputer for convenience, ease of access, and to let researchers ease into using a HPC.
“We have a process where users can cut their teeth on smaller local HPC systems, and then as they get skilled at that, it allows them then to apply for grant-based time on bigger HPC systems at QCIF and nationally,” Sorley said.
“If you’re working on a job that will work on a single node or a cluster of small nodes, the QRIScloud and Nectar Cloud works really well, and then the next tier above that is the institutional-based systems, and then you have QCIF and the national systems. And we have researchers working in all of those spaces.”
The university hopes that by increasing the compute capabilities over what was previously available on-campus, a broader range of faculties will be able to make use of high-performance computing.
“Some of our agricultural engineers and business faculty are keen to use more computational power,” Soreley said.
Because the new computer will be the first HPC system many research students use, making sure the it is properly documented has been an essential task for Sorley’s team.
“For our post-graduate researchers, this is where they develop their skills, because once they move on to larger facilities that require a competitive grant for space allocation, one of the questions they’re typically asked whether they’ve run the job before at other facilities and their experience,” Sorley said.
“Many of them are used to Windows, while most scalable HPC facilities use Linux, so you need to get those users used to using the platform and the applications.
“So it’s important to have good documentation in place to allow students to do things like log on and use the system, and as their workloads grow we expect them to be able to move their workloads on to QCIF or a national HPC if required.”
The most time-consuming and painstaking part of the installation process for the new machine has been making sure it is compatible with the applications researchers currently rely on.
“It’s quite easy to get a new piece of hardware, or a new HPC, up and running. That’s a technical problem that most IT departments are quite good at,” Sorley said.
“The really tricky bit is making sure that a particular user with a particular piece of software they’ve used in the past will work on that platform.
“So we’ve been going through, ticking off software applications, testing them and rebuilding them so they meet the users’ needs and work the way the users expect them to work.”
Because of the large number of applications in use, this meant a painstaking process of identifying software, then recompiling and optimising it to work on the new system.
“As far as our application list, it’s pretty long and extensive. Our engineering users are fairly strong users of Matlab, Ansys and Comsol," Sorley said.
“And there is quite a lot of software custom built for areas such as astrophysics that run in Python or C, while some of our climate group run software that’s written in R. So there’s quite a few bespoke software packages written for specific disciplines.”
While the IT team initially loaded the machine with the most recent version of each of the software packages, it soon became apparent that a lot of the bespoke applications weren’t designed to use the most recent version.
“With the open compute platforms, like MPI-CH and Open-MPI, we have multiple versions of those to make sure there’s compatibility for different users requiring different versions of the software," Sorley said.
“We [also] had to make sure that we had three or four different versions of Python installed because applications were written against different versions of Python, and when you upgrade Python to the latest version, applications might not work anymore.
“So getting those dependencies mapped out was the most time-consuming task. When we reviewed the software that we need to run, it also became apparent that quite a bit of it wasn’t in use anymore.
“My key bit of advice is to know your application environment and your end users really, really well to make sure your system will run what you need it to run.”