By increasing chilled water temperatures by 4.4 degrees Celsius, researchers at Argonne's Leadership Computing Facility (ALCF) have managed to reduce power costs by an estimated 800 kilowatts -- US$50,000 -- per year.
The ALCF operates an IBM Blue Gene/P supercomputer called Intrepid, which launched in June 2008 as the third fastest computer in the world.
According to ALCF researchers, cooling the 557-teraflop machine currently consumes more electricity than it requires to run.
iTnews sat down with ALCF director Pete Beckman and ALCF project manager Jeff Sims to discuss the facility's innovative power management techniques and their potential for the enterprise.
Please introduce yourselves and your roles at ALCF.
Pete: I am the director of the ALCF and I am also a research scientist in the Math and Computer Science division.
Jeff: I'm a project manager at Argonne, so as projects come up, I help get them organised and follow through till the end. My background is in engineering.
Could you tell us a little about Argonne; why was Intrepid installed and what are its goals?
Pete: The DOE maintains two very large computing facilities for the largest, fastest computers of their kind for open science. The Argonne facility is designed to provide supercomputer cycles for the most challenging computational problems around the country and around the world.
Every year, people submit proposals asking for time on the machine, and the best proposals are then given time. Anyone in the U.S. who is willing to publish their results in an open way can apply.
We have people from the National Science Foundation, the National Institute of Health and also collaborators from the U.K. and France who apply and get time on our supercomputer. We have projects that span nanotechnology and biology at a molecular scale, all the way up to scales of galaxies and stars.
Why is power consumption a particular concern?
Jeff: I'd answer that in three different ways. Number one, the DOE strives to be energy-efficient in everything that they do.
Number two, next-generation machines will have even greater electrical demands, so we're trying to do everything that we can now to engineer more efficient solutions so we can kind of stay ahead of the wave of more power-hungry machines.
Three, reduced operations cost allows us to procure more hardware. The less we pay in our power bill, it frees up money so we can buy more hardware.
I understand that cooling accounts for a significant portion of Intrepid's power consumption. Can you please describe the cooling process and how you plan to reduce its energy requirements?
Jeff: Since the Blue Gene/P has high air-cooling demands -- about 5,000 cubic feet per minute per rack -- the conventional computer room air-conditioner, or crack units, aren't really effective in our application.
We've employed a high volume building-type air handler system to pressure the underfloor with about 64°F (17.8°C) air. That 64°F air gets pulled up through the machine, and that's what cools the machine.
The 64°F air is produced by passing the air through cooling coils that chilled water flows through. The production of that chilled water that flows through those cooling coils is the most costly part of the cooling process.
During our fall, winter and spring months in northern Illinois [where ALCF is located], water is cooled by the environment, then we pull that back into the room and that's used in the cooling coils to cool off the air.
This process is called waterside economising, or free cooling. It's not really a new technique, but it's something that nowadays, with green building techniques and sustainability, we have to pay close attention to.
During the warm, summer months, Mother Nature isn't going to cool off the water for you. In that case, we use what's called mechanical cooling where a centripetal chiller compresses refrigerant and creates cold water -- kind of like your refrigerator at home.
That compressor is very power hungry, especially when you're talking about 600 to 800 tonnes of cooling in our case. That's the thing that uses a lot of electricity so we're trying to minimise the period where we run those compressors.
What we're trying to do now is determine the warmest temperatures that we can dump into those cooling coils so we can do two things. First, we can maximise the free cooling period.
The second thing is that the warmer temperature allows us to run the chillers less, and less hard. The compressors don't have to work as hard to provide 55°F (12.7°C) water compared to 46°F (7.8°C) water.
How are you able to use warmer water to cool the machine?
Jeff: Engineers by their training are conservative people. Using the best guesses that the vendor has, they come up with a design that tells you what chilled water temperature you should run at.
You find out after you put the computer in that your mixture of applications probably doesn't drive the computer to the peaks that were used in the engineering design.
What we're understanding now with our history is: how does the computer actually run and what chilled water temperature does it really need to be cooled off to.
So far, we've been able to raise the chilled water from its original design temperature of 46°F (7.8°C) to up to 54°F (12.2°C).
Ballpark numbers is that that's saved about 10 to 15 percent of the electrical demand on the chillers and it extends the free cooling period by as much as two months each year. Now we can do [free-cooling for] at least eight months out of the year with 54°F water.
In Illinois, if you can get your chilled water requirement up to 70°F to 75°F (21.1°C to 23.9°C), you can free cool the entire year. That's an interesting thing that we're talking to IBM about: using warmer temperatures on future machines so we can maximise our free cooling period.
How much power and money is the ALCF saving by using free cooling?
Pete: During the free cooling period, it could be upwards of US$25,000 a month. It's a third of the machine power [that has been saved], so maybe 300 to 400 kilowatts.
But let's say this machine uses on average a megawatt of power. Next generation machines and the generation after that could be using upwards of 40 megawatts of power.
Little tweaks that we're doing right now may not seem to have a huge impact, but we need to learn from what we're doing and optimise this for future systems, because in future systems, doing these little changes will literally be [saving] millions of dollars a year.
Read on to page two to find out how these techniques can be applied to enterprise computing.