Along with a research team from the U.S. Department of Energy’s Lawrence Berkeley National Laboratory, Shalf is designing a supercomputer based on low-power embedded microprocessors, which has the sole purpose of improving global climate change predictions.
Shalf spoke with iTnews about the desktop and embedded chip markets, inefficiencies in current supercomputing designs, and how the Berkeley design will achieve a peak performance of 200 petaflops while consuming less than four megawatts of power.
How do mobile devices come into play in your supercomputer design?
The motivation is that we’ve gotten to the point where the cost of power that goes to new supercomputing systems is getting to be very close to the cost of buying them.
When you move from the old [approach to] supercomputing, which is performance-limited, to supercomputing where the limiting concerns are power and cost, then all the lessons that we need to learn are already well-understood by people who used manufacture microprocessors for cell phones.
A lot of [current] supercomputers have USB ports and sound chips -- they will never be used and yet they consume power. They [manufacturers] call it commodity off the shelf [COTS] technology, where if you want to have things cheap, you leverage the mass market.
Now, the market has moved away from desktops down to individual cell phones, it’s going to change the entire computing industry I think.
In terms of investment in microprocessor technology, it used to be dominated by the desktop machines, but now the iPhones and iPods are where all the money for research into advance microprocessor designs is going. We’re leveraging that trend, and we’re kind of like the early adopters of that idea.
How will the Berkeley design require less power than current approaches to supercomputing?
The desktop chip market or the server market, that we’ve been basing our supercomputer designs on, emphasise serial performance. That is to get high clock frequencies, and to make things that aren’t parallel -- like Microsoft Word or PowerPoint -- run as fast as they can.
However, when you look at the Physics of how power consumption is related to clock frequency, voltage squared is related to clock frequency. So if you reduce the clock frequency modestly, we get a cubic power efficiency benefit.
If you compare a high end server chip that consumes 120W running at 2GHz, if we just drop the clock frequency to 600MHz we can get the wattage down to 0.09W.
Another way to reduce power is to remove anything that you don’t need for that particular device from the processor.
[Partner company] Tensilica can create 200 new microprocessor designs per year. Their tools allow them to tailor-make special processors for each new thing they want to do, and they can do it very fast.
We’re using their design tools to make a microprocessor that removes everything that we don’t need for this climate application.
Can the same concept be used in general purpose supercomputing? Are general purpose computers a feasible concept?
In order for this [the Berkeley approach] to work, you need a problem that runs in parallel, because you need more of these [low clock frequency] processors to match the performance of a really big one. It happens that scientific applications already have plenty of parallelism available.
While the desktop chips –- the Intels, the AMDs -- can’t really play this game because things like Microsoft Word aren’t running in parallel, we can exploit this way beyond the ability of the desktop chip.
I think they [general purpose supercomputers] are a realistic idea, and there’s still a place for general purpose supercomputing systems. In terms of general purpose supercomputing, there will be large systems that handle a broad array of applications.
We’re saying for certain computation problems, ours is the correct approach, but it doesn’t supplant the need for general purpose computing because there are many problems that are much smaller than the petaflop or the exaflop.
What is your research team working on currently?
We’re currently doing this iterative process where we adjust all the aspects of the processor -- how much memory it has, how fast the memory is, how many instructions it does per clock cycle – all these things fixed in a conventional desktop chip, but we can adjust everything about the microprocessor design.
We have something that automatically tunes the software after we make a hardware change, then we benchmark it, measure how much power it takes, then we change the hardware again. We keep on iterating to come up with the optimal hardware and software solution for power.
When do you expect to achieve tangible results?
We want to demonstrate the first prototype in November, but that would just be one processing element in the system. The way that we’re able to do that is using something called RAMP [Research Accelerator for Multiprocessors] at U.C. Berkeley, which is a system that allows us to prototype new hardware designs without actually building them from scratch.
If we were to actually get enough money to create some chips, it would take us an additional six months or so to get chips out.
How much will the Berkeley supercomputer cost to build?
The cost of putting all the components in the same place is probably in the $30-50 million range. A lot of that cost is just the cost of memory chips.
However, I’d point out that that’s the typical cost of buying, from IBM or Cray, a supercomputing system. So it’s on the same order of cost that we currently put into systems that are one thousandth of the performance that we need to solve this problem.
Are any other groups taking the same approach to supercomputing as yours?
There is something called MD-GRAPE which is in Japan [and used for] molecular dynamics. They [researchers] showed that by designing a custom chip for that application, they could do something that was 1,000 times more efficient than using conventional microprocessors.
It cost them a total of $9 million to do that.
Another group is D.E. Shaw Research that has built a system called Anton. They are also using Tensilica processors, and that system is 100 times to 1,000 times more powerful than is achievable using conventional microprocessors.
MD-GRAPE is an older system. Anton, they just booted up the first nodes a couple of months ago, and they’ve been testing that out, demonstrating that it works.
How does the Berkeley approach differ from the other projects?
It’s [Climate change] a new area and also we’re more leveraging off-the-shelf design tools and less dependent on fully customised hardware that requires a lot more energy and more time investment.
Do you foresee there to be any commercial opportunities for the technology?
People have spent such a long time saying you can’t compete against the big microprocessor companies to create an efficient machine for science, and that was definitely true when power wasn’t a limiting factor. But now, we need to show the feasibility of this approach so that it can change the way that we design machines for supercomputing.
If we’re successful as researchers, IBM or Intel -- if they see a market for this -- will turn around and do this.
More information is available from the Lawrence Berkeley National Laboratory's Web site.
Researcher discusses iPod supercomputer
By Liz Tay on May 9, 2008 1:52PM