The DIY guide to supercomputers

Click to follow
The Independent Culture
EARLIER THIS year I wrote about how I built a computer from the parts bins at Fry's, a local Silicon Valley institution. I installed the Linux operating system on it, and said computer is still chugging away as a Web server, telling the tale of how I put it together (at

Seems as though my idea caught on.

Mike Warren, an astrophysicist at the Los Alamos National Laboratory in New Mexico, recently put the final touches to Avalon, a do-it-yourself supercomputer that his lab team put together using off-the-shelf, high-end DEC Alpha PCs. Avalon recently became the 88th fastest computer on the planet after some new CPUs were hooked up,

Avalon's price tag - $300,000 - is a bit more than the $700 (pounds 425) I spent, but a fraction of what most supercomputers cost. IBM recently installed "Pacific Blue" at the Lawrence Livermore Laboratory for $94m. True, it's more than 80 times faster than Avalon, but it's also 300 times more expensive.

And, Avalon is not alone. There are lots of home-built supercomputers. Thomas Sterling, a computer scientist at the California Institute of Technology and Nasa, stirred things up by publishing a specification for low-cost clusters of PCs called Beowulf. Avalon is just one Beowulf-class supercomputer.

The idea is simple. You set up an arbitrary number of PCs, network them, typically using fast Ethernet, and then send them problems that can be divided up among the machines' processors. One machine acts as a server that syncs up all its clients. Beowulf specs software like the Message Passing Interface (MPI) written under Linux, that lets the machines communicate. And since Linux - the brainchild of the computer science student Linus Torvalds - is free, it keeps the cost down.

One group, at Australia's University of South Queensland, built Topcat, a Pentium-powered, Beowulf-class computer, for just A$8,000 (pounds 3,137). Other Beowulf machines are called Loki (pre-Avalon), Grendel and Megalon and have various price tags, all modest for their power.

The Linux OS has had a pretty phenomenal year, but an abstract on the Argonne National Laboratory website describes a machine called Chicago Pile 6 that runs under Windows NT, and Microsoft offers the "Wolfpack", clusters of Windows NT machines that can be harnessed together for big problems.

And, to round things out, a group at the UCLA Physics Department's Plasma Simulation Group has a $28,000 parallel computer called Appleseed - eight Power Macintosh G3s - which outperforms a Cray Y-MP. UCLA's statistical mathematics department even plans to link up Apple's new iMacs in their computer lab to make a "very pretty supercomputer".

One advantage of the design of most of these machines is that other machines can be roped in as needed for big jobs. The UCLA physics department home page mentions using other Macs in the department at nights and weekends. If the machines are on the network, they're fair game for becoming part of the supercomputer (as long as you can persuade their regular users). UCLA's Macs even use MacMPI, so that problems can be run on Linux or Mac or a "real" supercomputer without rewriting code.

In fact earlier this year, a group called linked thousands of computers of all kinds around the world via the Internet, and cracked a 56-bit encryption code in 40 days. It had previously been thought that such heavyweight ciphers would take hundreds of years to crack, even on fast computers. One version of the program ran as a screen saver that kicked in, and began cracking code, when the machine was idle for more than a few minutes.

You don't normally think about a screen-saver being a powerful program, but when thousands of them act in concert, some serious computing can get done. bills itself as the "Fastest Computer on Earth", even though its hardware bill is effectively zero.

To be sure, the megabucks hardware offers some advantages that the home- brew machines can't match. For one, the expensive machines tend to be more general-purpose, and have an advantage when huge volumes of data have to be loaded.

Beowulf clusters tend to do best on problems that are easily broken up into independent pieces, such as particle simulations, with each machine representing a particle interacting with its neighbours. can crack ciphers only because its client software just knows code-breaking: running a different task would mean reinstalling tens of thousands of client software for every new job.

Some problems just don't chop up easily, and run better on the "big iron". But there aren't many supercomputers around (not at those prices!) and scientists have to queue for time on them. Massive data sets can take so long to load that only a little computing is done in any one window of opportunity.

In fact, it was just this dilemma that prompted Mike Warren and others to build their own supercomputers. They can apply their machines widely, and to classes of problems that would only rarely get a chance on that expensive supercomputer time.

For example, here at, our "single-node supercomputer" is churning furiously, trying to figure out how to come up with another $700.