Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Memory benchmark comparison DDR333 vs RDRAM PC1066 !

Author: Robert Hyatt
Date: 08:46:05 12/03/02
On December 02, 2002 at 23:21:23, Matt Taylor wrote:

><snip>
>>>Does hyper-threading really help that much? It seems like it would create more
>>>contention for limited resources (decoder, internal u-op cache, even some
>>>execution units). I would be extremely interested in seeing hyperthreading
>>>benchmarks with Crafty.
>>
>>
>>First, if you look at the concept of trace-cache, it is _behind_ the decoder,
>>and all it stores are decoded instructions (micro-ops).  Since Crafty uses the
>>_same_ code in all threads, it is likely that the shared L1 I-cache (and the
>>L1 D-cache and L2 cache) will all contain stuff that is useful across the two
>>threads...
>
>Yes, but the figures Intel lists are 1 instruction decoded per cycle and up to 3
>supplied by the trace cache. I suppose hyperthreading would make no sense unless
>they doubled the front-end of the pipeline (the trace cache and decoder).
>
>Come to think of it, the P4 Xeon may very well see enormous gains from
>hyperthreading as it would unlock the full potential of the chip. P4 from the
>start has been limited to at most 3 ops/cycle from the trace cache assuming that
>the code you want is actually IN the trace cache. It is equipped with 7
>execution units. However, two of the ALUs are double-pumped allowing for up to 5
>simple ALU ops/cycle (total of 9 ops/cycle). It should be painfully clear that
>under no circumstances can the full 5 ops/cycle be used -- by a single trace
>cache, anyway. If they have a second trace cache, P4 Xeon may very well see
>nearly twice the performance in hyperthreading...
>
>>Eugene already ran some and posted the results.  The raw NPS went up by a
>>factor of 1.3X.  I think more can be had but at a couple of critical places
>>where I have a "busy spin" I need to insert a "pause" asm instruction so that
>>the cpu will work on the thread doing useful work if there is a choice...
>
>How can you use the hlt instruction? It's privileged, and you're in ring 3.

Not "halt".  "pause".  It is a no-op on non-hyperthreaded CPUS.  and all it does
is to cause the internal "thread scheduler"  to execute the other thread until
it
blocks.

>
>Intel claims about 30-40% speed gains from hyperthreading, but that makes the
>assumption that different instructions are utilized across different types of
>applications. I would also guess that it falls into a sort of resonance where
>one application is doing its heavy computation while the other utilizes the
>memory bus.

Correct...  Or anything that makes _both_ threads block a fair amount, such as
waiting on memory, on results from a memory-mapped read to a device controller,
etc...  if both are blocking, they interleave nicely and go 2x faster.

>
>I'm not sure if the P3 ever shipped with hyperthreading, though I recall hearing
>about it in the days of the P3. The last Intel chip I bought was my Pentium 120.
>Has it been tested on a P3 Xeon with hyperthreading by any chance?

No idea.  I _think_ it started with the PIV, but am not sure.  The CPUID
instruction
will give an indication in the processor capability bitmap it returns...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.