GCC Ver 3.4.6 4.3.3 4.4.1
------ ------ ------
Trial 1 467.58 340.05 309.88
Trial 2 436.73 380.04 277.64
Trail 3 429.33 351.14 290.84
------ ------ ------
Average 444.55 357.08 292.79
Speedup 1.00 1.24 1.52
(rel. to
3.4.6)
To compile this application, I used the flags "-g -funroll-loops -march=core2 -O3 -Wall" for 4.3.3 and 4.4.1 and "-g -funroll-loops -O3 -Wall" for 3.4.6 (which lacks the "-march=core2" option). For general use, I'd suggest -O2, add -march=core2 if you know your application will only run on newish hardware, and add -funroll-loops for applications with heavy STL use. If doing heavy floating point arithmetic, check out the -ffast-math option, but be sure you are comfortable with what this option enables and disables. A recent Markov Chain Monte Carlo (MCMC) simulation application I examined ran 30% faster simply by enabling -ffast-math.
Options for compiling 32-bit applications are more complicated, especially for floating point heavy applications, but I discourage scientific applications from being built 32-bit. An increased number of registers and improved memory accesses is available on 64-bit Intel platforms, if the application is compiled 64-bit. so, many scientific applications have a performance boost simply by compiling them 64-bit.