Low cost, reliable clustering has enabled Monash University to deliver supercomputing power to its researchers.
High Performance Computing Clusters (HPCC), which are virtualised groups of Intel-based blade servers running Linux, have enabled the Melbourne-based University to maintain a supercomputer with non-specialist staff, according to Adrian Ling, Monash's manager of infrastructure and major IT projects.
In 2007, Ling said the university had to send research projects that required large scale number crunching to external facilities.
Just two years on, Ling claims the HPCC project has been so successful, the university is close to renting out its spare computing power to third parties.
Ling admits he was initially "doubtful" the Dell-based HPCC would deliver what was promised but he said the system has proven its reliability, with some nodes not requiring any attention for almost two years.
"Most of the time, the HPCC sits in the background and I don't get any complaints. I had to turn the system off to do some serious maintenance... and I noticed that some of the older nodes had been up for 598 days," said Ling, who joked that he was tempted to delay the maintenance in order to "break that 600 barrier".
IBRS analyst Kevin McIsaac believes that Ling's experience is evidence that Dell has fixed problems with its early HPCC systems.
"Some of the early Dell blades had a real problem. [Ling] said he hasn't had a single failure and no problems with warranty for years... well apparently the equipment now works, and it works really well," said McIsaac.
According to McIsaac, HPCC is a sign that we are moving to what he calls Mainframe 2.0, which is a supercomputer that is relatively simple to manage and run.
"What we are witnessing now are the very beginnings of the reinvention of computing away from islands of separate servers back to a single resource that is made of commodities but functions a lot like a mainframe," he said.
Monash's Ling also boasted about the lack of specialist skills required to run his HPCC. He claims that IT staff with basic skills has found it relatively easy to maintain the system.
"Manpower is in such short supply and so expensive," he said. "Fortunately it doesn't require a lot of manpower to run. I have some new engineers and I explained to them how to put in a node, how to enable the image, what you do when a user rings up and says the system is not working.
"Believe it or not, they pick it up and run with it. That is proof of a low cost solution [to] me."