To test HT, we ran a varying number of concurrent instances of a CPU intensive application on two machines, each with a single X5570 processor, one with HT enabled and other with HT disabled. The test program fits cache, so there should be little contention for memory bandwidth. The graph below summarizes the findings using solutions/sec generated by our test program as the performance metric. Measurements from our HT-enabled machine are marked HT, and measurements from our HT-disabled machine are marked NHT (No HT).
Given the X5570 is a quad core processor, we expect a pretty consistent number of solutions/sec up to and including four processes. There is actually a very slight improvement running only a single process, and perhaps this is attributable to Turbo Boost. Up to and including four processes there is also no difference between having HT enabled or disabled, so there is no penalty to having HT enabled.
If we define two functions that estimate the solution rate for HT-enabled and HT-disabled processors: RateHT(n) and Rate(n), where n is the number of processes, so far we have Rate(x)=RateHT(x) and Rate(x)=Rate(x-1) for x <= 4 (Rate(1)=1). Move to eight and sixteen processes and things are more interesting. Processes on the HT-disabled processor run at about 1/2 the previous rate, or Rate(8) = Rate(4)/2 and Rate(16) = Rate(8)/2. However, on the HT-enabled processor, RateHT(8) = 1.3 RateHT(4)/2 and RateHT(16) = RateHT(8)/2.
So, HT provides a definite advantage after the process count exceed the physical core count. While each process runs slower after exceeding the number of physical cores, the total throughput of the processor is roughly 30% higher. I'll take that.
HT has returned with Intel's Nehalem based processors and there appears to be good reason to enable it, even if running CPU intensive applications.
