Comparing Fortran Compilers

I’m just testing out the Fortran 90 compilers on our AMD quad core cluster Tango based on some code that Joe Landman wrote as a test case in January 2008, using the same input file as him. The compilers I’m using are GCC 4.3.3, Intel 11.0.81 and PGI 8.0-3.

For the unoptimised (-O0) version I get:

  • GCC: 1.884s
  • Intel: 3.891s
  • PGI: 1.170s

For basic optimisation (-O) I get:

  • GCC: 1.617s
  • Intel: 3.515s
  • PGI: 0.954s

Cranking up the optimisation to -O2 sees no change:

  • GCC: 1.610s
  • Intel: 3.514s
  • PGI: 0.954s

Now we add compiler specific flags:

  • GCC (-march=amdfam10 -O3): 0.956s
  • Intel (-fast): 3.507s
  • PGI (-fast -tp shanghai-64): 0.997s

That got me wondering which had the greater impact, -O3 or the -march=amdfam10 and the result was surprising:

  • GCC (-O3): 1.611s
  • GCC (-march=amdfam10 -O0): 1.238s

So that’s pretty conclusive, just enabling the AMD k10h CPU (i.e. Barcelona/Shanghai processors) with no optimisations gives a better speedup than the highest level of optimisation! Of course it’s better with both, as you can see from the previous set of results.

I’m *really* impressed by GCC’s performance there, as well as the PGI unoptimised speed, and disappointed by the Intel compilers general lack of performance. I suspect Intels answer would be (not unreasonably) that they don’t necessarily target performance for their competitors CPUs.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>