A Rough Guide to Scientific Computing On the Playstation 3

Eugen Leitl has just posted on the Beowulf list a message with a link to a draft of a paper by Alfredo Buttari, Piotr Luszczek, Jakub Kurzak, Jack Dongarra and George Bosilca called A Rough Guide to Scientific Computing On the Playstation 3. It’s a 74 page PDF looking at the possibilities and problems with using the PS3 for scientific computing (there is already a PS3 Linux cluster at NCSU).

The introduction to the paper lets you know that this isn’t going to be easy..

As exciting as it may sound, using the PS3 for scientific computing is a bumpy ride. Parallel programming models for multi-core processors are in their infancy, and standardized APIs are not even on the horizon. As a result, presently, only hand-written code fully exploits the hardware capabilities of the CELL processor. Ultimately, the suitability of the PS3 platform for scientific computing is most heavily impaired by the devastating disproportion between the processing power of the processor and the crippling slowness of the interconnect, explained in detail in section 9.1. Nevertheless, the CELL processor is a revolutionary chip, delivering ground-breaking performance and now available in an affordable package. We hope that this rough guide will make the ride slightly less bumpy.

Of course, it’s unlikely you’re going to see the PS3 being used in production clusters anyway, so the interconnect shouldn’t be such a problem there.. 🙂

The paper covers the hardware, Linux support and how to get it onto a PS3, programming methods and models, MPI, performance, etc. The paper isn’t complete as I write, but it is still a very interesting read. HPC folks will certainly want to read section 9.1 “Limitations of the PS 3 for Scientific Computing”, especially the part that says:

Double precision performance. Peak performance of double precision floating point arithmetic is a factor of 14 below the peak performance of single precision. Computations which demand full precision accuracy will see a peak performance of only 14 Gflop/s, unless mixed-precision approaches can be applied.