I’m just testing out the Fortran 90 compilers on our AMD quad core cluster Tango based on some code that Joe Landman wrote as a test case in January 2008, using the same input file as him. The compilers I’m using are GCC 4.3.3, Intel 11.0.81 and PGI 8.0-3.
For the unoptimised (-O0) version I get:
- GCC: 1.884s
- Intel: 3.891s
- PGI: 1.170s
For basic optimisation (-O) I get:
- GCC: 1.617s
- Intel: 3.515s
- PGI: 0.954s
Cranking up the optimisation to -O2 sees no change:
- GCC: 1.610s
- Intel: 3.514s
- PGI: 0.954s
Now we add compiler specific flags:
- GCC (-march=amdfam10 -O3): 0.956s
- Intel (-fast): 3.507s
- PGI (-fast -tp shanghai-64): 0.997s
That got me wondering which had the greater impact, -O3 or the -march=amdfam10 and the result was surprising:
- GCC (-O3): 1.611s
- GCC (-march=amdfam10 -O0): 1.238s
So that’s pretty conclusive, just enabling the AMD k10h CPU (i.e. Barcelona/Shanghai processors) with no optimisations gives a better speedup than the highest level of optimisation! Of course it’s better with both, as you can see from the previous set of results.
I’m *really* impressed by GCC’s performance there, as well as the PGI unoptimised speed, and disappointed by the Intel compilers general lack of performance. I suspect Intels answer would be (not unreasonably) that they don’t necessarily target performance for their competitors CPUs.