GCC 4.4.1 faster than GCC 3.4.6 and 4.3.3

It's amazing so few developers keep up with the latest version of their compiler. Not only do bugs get fixed, but staying up to date is an easy way to gain free additional performance. The table below shows timings (in seconds) for a single threaded, mixed integer and real, heavily templated C++ application running on a Xeon X5355 processor, compiled with three versions of GCC. There are obviously performance benefits, for some applications, simply by switching compilers.

GCC Ver 3.4.6 4.3.3 4.4.1
------ ------ ------
Trial 1 467.58 340.05 309.88
Trial 2 436.73 380.04 277.64
Trail 3 429.33 351.14 290.84
------ ------ ------
Average 444.55 357.08 292.79
Speedup 1.00 1.24 1.52
(rel. to
3.4.6)

To compile this application, I used the flags "-g -funroll-loops -march=core2 -O3 -Wall" for 4.3.3 and 4.4.1 and "-g -funroll-loops -O3 -Wall" for 3.4.6 (which lacks the "-march=core2" option). For general use, I'd suggest -O2, add -march=core2 if you know your application will only run on newish hardware, and add -funroll-loops for applications with heavy STL use. If doing heavy floating point arithmetic, check out the -ffast-math option, but be sure you are comfortable with what this option enables and disables. A recent Markov Chain Monte Carlo (MCMC) simulation application I examined ran 30% faster simply by enabling -ffast-math.

Options for compiling 32-bit applications are more complicated, especially for floating point heavy applications, but I discourage scientific applications from being built 32-bit. An increased number of registers and improved memory accesses is available on 64-bit Intel platforms, if the application is compiled 64-bit. so, many scientific applications have a performance boost simply by compiling them 64-bit.