Filter Senator Conroy (.org)

There’s a website now up called Filter-Conroy.org to persuade people in Victoria to vote below the line at the next federal election to sack Senator Conroy if he does not abandon his wrong-headed plans for mandatory ISP level censorship and waste valuable taxpayers funds (which could go to the police to fight paedophiles if the government really wanted to achieve something). I strongly commend this site to my fellow Victorian voters.

A (Red) Rising Star in the Latest Top500 Supercomputer List

The 35th Top500 supercomputer list has just been released at ISC2010 in Germany, and it’s got some very interesting things in it.

Firstly, China has just got the #2 system on the Top500 with an nVidia GPU based cluster called Nebulae. At 1.271PF measured (Rmax) it’s just over 70% of the performance of the current #1 system Jaguar but (if you believe that it’s worth anything) on theoretical performance it’s 2.9PF beats out Jaguar’s 2.3PF – this means that if they can optimise Linpack some more for the architecture then perhaps they have a shot of overtaking Jaguar and taking #1 (assuming nothing larger comes along in the next 6 months).

Secondly China has also taken the #7 spot with an AMD GPU based cluster (notice a pattern here?) and now have 24 systems in the Top500 and have overtaken Germany to take the #2 spot in terms of total performance in a country at 2.9PF, though a long way behind the US with over 17PF total Rmax. I think the Chinese have arrived with a vengeance and I suspect they’re going to carry on boosting their capacity, especially as both their Top 10 systems are built by Chinese organisations.

Linux continues its domination of the Top500, increasing its share of systems from 446 (89.2%) in November 2009 to 455 (91%) today. Windows has just 5 systems in total, unchanged from last year. Tellingly it appears they are the same 5 as November as the stats are unchanged – it appears there may be stagnation in the uptake of Windows HPC at the high end.

Australia has just one system in the Top500, the Bureau of Meteorology / HPCCC Sun^WOracle cluster in Melbourne. It’s ranked at #113 with 49.5TF at which is pretty impressive. though I’m puzzled why its much bigger sibling at NCI/ANU in Canberra didn’t get a mention, perhaps they chose to just get it into production ASAP without faffing around with Linpack ? Based on their estimated Rpeak and the efficiency of the BoM machine I reckon they’d get an Rmax of about 128TF and would place about #35.

But without the NCI machine Australia ranks behind such well known HPC countries as Austria and Denmark and well behind the likes of New Zealand and India!

VLSCI Mid Year Call for Applications from Victorian Life Science Researchers

A quick work related blog..

Today VLSCI announced its mid-year Call For Applications for use of the Peak Computing Facility at the University of Melbourne by life science researchers in Victoria. This includes time on our forthcoming IBM Blue Gene/P HPC system as well as the existing SGI Altix XE HPC cluster and a forthcoming IBM iDataPlex HPC cluster (both Intel Nehalem systems).

Pass it on!

Portable Hardware Locality (hwloc) Library v1.0 Released

One of the things that us HPC folks tend to get hot under the collar about is hardware locality, basically making sure that your memory accesses are as fast as possible by optimising where on the system you’re getting memory from and making sure your process doesn’t get moved further away. Just binding your processes to the cores they are on can make for a significant speed up so it’s well worth doing. If you’ve just got a single socket, or a pre-Nehalem Intel x86 system then your path to RAM has been pretty much identical wherever you are so the only benefits are from not moving away from your CPU cache lines but on AMD Opteron, Nehalem, Itanic, Alpha, etc you really should care a lot more about locality for best performance.

The open source Torque queuing system (which I help out with) does some of this already, if you compile it with –enable-cpuset and have the /dev/cpuset virtual filesystem mounted then before it starts a job on a node it will create a cpuset for that (based on what cores have been allocated on the node) and then put the HPC processes into that cpuset. If you’re using Open-MPI 1.4.x and have the environment variable OMPI_MCA_orte_process_binding set to core then each of the MPI ranks will bind itself to one of the cores within that cpuset.

All good ? Well not quite as Torque is reliant on /dev/cpuset being there and being able to parse the contents of it and Open-MPI 1.4.x uses the Portable Linux Process Affinity (PLPA) library which, as its name suggests, is only for Linux. So the good Open-MPI people looked at their PLPA library and decided it needed extending and teamed up with the INRIA libtopology team who were working on how you discover the topology of various architectures and decided to merge the two projects together under the banner of the Portable Hardware Locality (hwloc) library.

The Portable Hardware Locality (hwloc) software package provides a portable abstraction (across OS, versions, architectures, …) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various system attributes such as cache and memory information. It primarily aims at helping applications with gathering information about modern computing hardware so as to exploit it accordingly and efficiently.

The portable bit of the name comes from the fact that it works on Linux, Solaris, AIX, Darwin, FreeBSD, Tru64, HP-UX and Windows (though with limitations on some architectures – e.g. Windows – which don’t expose all the info it needs) and can extended for other OS’s if people feel they need to scratch that itch (OpenVMS anyone?). This release is also embeddable into projects (such as Open-MPI 1.5) and I have an interest in Torque picking it up to improve and extend its cpuset support.