Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Memory benchmark comparison DDR333 vs RDRAM PC1066 !

Author: Matt Taylor

Date: 20:21:23 12/02/02

Go up one level in this thread


<snip>
>>Does hyper-threading really help that much? It seems like it would create more
>>contention for limited resources (decoder, internal u-op cache, even some
>>execution units). I would be extremely interested in seeing hyperthreading
>>benchmarks with Crafty.
>
>
>First, if you look at the concept of trace-cache, it is _behind_ the decoder,
>and all it stores are decoded instructions (micro-ops).  Since Crafty uses the
>_same_ code in all threads, it is likely that the shared L1 I-cache (and the
>L1 D-cache and L2 cache) will all contain stuff that is useful across the two
>threads...

Yes, but the figures Intel lists are 1 instruction decoded per cycle and up to 3
supplied by the trace cache. I suppose hyperthreading would make no sense unless
they doubled the front-end of the pipeline (the trace cache and decoder).

Come to think of it, the P4 Xeon may very well see enormous gains from
hyperthreading as it would unlock the full potential of the chip. P4 from the
start has been limited to at most 3 ops/cycle from the trace cache assuming that
the code you want is actually IN the trace cache. It is equipped with 7
execution units. However, two of the ALUs are double-pumped allowing for up to 5
simple ALU ops/cycle (total of 9 ops/cycle). It should be painfully clear that
under no circumstances can the full 5 ops/cycle be used -- by a single trace
cache, anyway. If they have a second trace cache, P4 Xeon may very well see
nearly twice the performance in hyperthreading...

>Eugene already ran some and posted the results.  The raw NPS went up by a
>factor of 1.3X.  I think more can be had but at a couple of critical places
>where I have a "busy spin" I need to insert a "pause" asm instruction so that
>the cpu will work on the thread doing useful work if there is a choice...

How can you use the hlt instruction? It's privileged, and you're in ring 3.

Intel claims about 30-40% speed gains from hyperthreading, but that makes the
assumption that different instructions are utilized across different types of
applications. I would also guess that it falls into a sort of resonance where
one application is doing its heavy computation while the other utilizes the
memory bus.

I'm not sure if the P3 ever shipped with hyperthreading, though I recall hearing
about it in the days of the P3. The last Intel chip I bought was my Pentium 120.
Has it been tested on a P3 Xeon with hyperthreading by any chance?



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.