VPAC is looking for an Operations Manager

Don’t panic, this isn’t about me.. 😉 No agencies please!

The Victorian Partnership for Advanced Computing (VPAC) is looking for an Operations Manager:

We are looking for a dynamic leader with excellent IT knowledge to lead and manage our High Performance Computing (HPC) team based at VPAC (housed at RMIT University in Carlton).

Your ideal background would include management of similar teams and the provision of strong hands-on experience, coupled with full responsibility for technical infrastructure. The Operations Manager will build and maintain strategic relationships with key stakeholders such as Victorian Universities and national initiatives such as ARCS, NCI and ANDS.

Reporting to the Chief Executive Officer you will be a key member of the VPAC Management Team and lead a growing team of around 15 Systems Administrators and Developers. As VPAC is aiming for industry best practice and holds ISO accreditation it will be expected that you will have worked in similar environments that provide a process based approach to IT service management.

A senior level remuneration package will be negotiated with the successful applicant. To obtain a copy of the position description and/or to apply for this exciting opportunity please email recruitment@vpac.org

There is also a copy of the PD on the VPAC employment positions web page.

As ever please contact VPAC recruitment, not me, about this position..

Belle Monte-Carlo Production on the Amazon EC2 Cloud

A few weeks ago Martin Sevior and Tom Ffield of the University of Melbourne did a talk at VPAC called “Belle Monte-Carlo production on the Amazon EC2 cloud” based on a paper they’d presented at the International Conference of Computing in High Energy and Nuclear Physics. The presentation is now available on the VPAC website.

It’s all about testing the cloud computing model via Amazon EC2 for Monte Carlo production for the SuperBelle experiment at the KEK collider in Japan. My favourite comment is that for a real full production run on Amazon EC2 to be useful it would need to be able to return data from S3 to the KEK collider at 600MB/s (~4.7Gb/s) sustained.

I don’t know what Amazon would say to that – well, apart from maybe “no”. 🙂

NB: This is the talk I mentioned in the comments on Joe Landman’s blog post called “Cloudy Issues“.

Comparing Fortran Compilers

I’m just testing out the Fortran 90 compilers on our AMD quad core cluster Tango based on some code that Joe Landman wrote as a test case in January 2008, using the same input file as him. The compilers I’m using are GCC 4.3.3, Intel 11.0.81 and PGI 8.0-3.

For the unoptimised (-O0) version I get:

  • GCC: 1.884s
  • Intel: 3.891s
  • PGI: 1.170s

For basic optimisation (-O) I get:

  • GCC: 1.617s
  • Intel: 3.515s
  • PGI: 0.954s

Cranking up the optimisation to -O2 sees no change:

  • GCC: 1.610s
  • Intel: 3.514s
  • PGI: 0.954s

Now we add compiler specific flags:

  • GCC (-march=amdfam10 -O3): 0.956s
  • Intel (-fast): 3.507s
  • PGI (-fast -tp shanghai-64): 0.997s

That got me wondering which had the greater impact, -O3 or the -march=amdfam10 and the result was surprising:

  • GCC (-O3): 1.611s
  • GCC (-march=amdfam10 -O0): 1.238s

So that’s pretty conclusive, just enabling the AMD k10h CPU (i.e. Barcelona/Shanghai processors) with no optimisations gives a better speedup than the highest level of optimisation! Of course it’s better with both, as you can see from the previous set of results.

I’m *really* impressed by GCC’s performance there, as well as the PGI unoptimised speed, and disappointed by the Intel compilers general lack of performance. I suspect Intels answer would be (not unreasonably) that they don’t necessarily target performance for their competitors CPUs.

Wanted: Linux Systems Administrator

The Victorian Partnership for Advanced Computing (VPAC) is looking for a Linux systems administrator to join our systems team working on grid computing.

  • Help build a grid across Australia!
  • Relaxed work environment.
  • Melbourne CBD fringe, easy access to trains and tram.
  • Salary around $55-60K+ (package contingent on experience)
  • Fixed term contract – 12 months.
  • Closing Date: 2nd August 2007

Reporting to the ICI Operations Manager, you will be working primarily in a Linux systems administrator role with Grid toolkits such as Globus and VDT. You will be involved in a National Project to provide Grid Based Computing available across Australia. The ability to work with and support our end users (typically scientific researchers and software developers) is very important in this role. Some national and international travel will be involved.

So if you think that it sounds interesting then please and go read the job advert on the VPAC website, or at least tell a friend! 🙂

Multi-Core for HPC: Breakthrough or Breakdown?

At SC06 there was a panel discussion on the final day about whether the trend to more and more cores on a socket was going to be good or bad for HPC. The feeling was that because the fact that chip makers need to do something to make up for stagnating clock speeds coincided with having more and more blank space on the die as transistor sizes shrunk more cores was inevitable.

However, this puts all your memory on the wrong side of the pins from the cores, and HPC will (must) need to find a way to deal with it!

The presentations were really good and I was a bit sad that I couldn’t get enough notes as the session was packed and I was up near the back, but I’ve just found out that all the slides used are up on the web as PDF’s, courtesy of the most amiable Thomas Sterling, who chaired the session.

The most illuminuating HPC related quote was from the slides of Steve Scott talking about how RAM characteristics have changed over the years:

1979 -> 1999:
16000X density increase
640X uniform access BW increase
500X random access BW increase
25X less per-bit memory bandwidth

My favourite non-HPC quote is from Don Becker’s slides:

My nightmare: An 80 core consumer CPU means your web experience will be 79 3­D animated ads roaming over your screen

Be afraid, be very afraid (on both grounds)..

A Rough Guide to Scientific Computing On the Playstation 3

Eugen Leitl has just posted on the Beowulf list a message with a link to a draft of a paper by Alfredo Buttari, Piotr Luszczek, Jakub Kurzak, Jack Dongarra and George Bosilca called A Rough Guide to Scientific Computing On the Playstation 3. It’s a 74 page PDF looking at the possibilities and problems with using the PS3 for scientific computing (there is already a PS3 Linux cluster at NCSU).

The introduction to the paper lets you know that this isn’t going to be easy..

As exciting as it may sound, using the PS3 for scientific computing is a bumpy ride. Parallel programming models for multi-core processors are in their infancy, and standardized APIs are not even on the horizon. As a result, presently, only hand-written code fully exploits the hardware capabilities of the CELL processor. Ultimately, the suitability of the PS3 platform for scientific computing is most heavily impaired by the devastating disproportion between the processing power of the processor and the crippling slowness of the interconnect, explained in detail in section 9.1. Nevertheless, the CELL processor is a revolutionary chip, delivering ground-breaking performance and now available in an affordable package. We hope that this rough guide will make the ride slightly less bumpy.

Of course, it’s unlikely you’re going to see the PS3 being used in production clusters anyway, so the interconnect shouldn’t be such a problem there.. 🙂

The paper covers the hardware, Linux support and how to get it onto a PS3, programming methods and models, MPI, performance, etc. The paper isn’t complete as I write, but it is still a very interesting read. HPC folks will certainly want to read section 9.1 “Limitations of the PS 3 for Scientific Computing”, especially the part that says:

Double precision performance. Peak performance of double precision floating point arithmetic is a factor of 14 below the peak performance of single precision. Computations which demand full precision accuracy will see a peak performance of only 14 Gflop/s, unless mixed-precision approaches can be applied.