Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: DIEP NUMA SMP at P4 3.06Ghz with Hyperthreading

Author: Robert Hyatt

Date: 11:33:47 12/13/02

Go up one level in this thread


On December 13, 2002 at 14:08:29, Vincent Diepeveen wrote:

>hello,
>
>Here some testresults of DIEP thanks to Chad Cowan at an
>asus motherboard with HT turned on (amazingly no longer
>SMT called, i forgot which manufacturer calls it HT and
>which one SMT. I guess it's Hyperthreading now for intel).

It is _both_.  SMT and HT.  You can find either term listed on Intel's
web site.


>
>HT turned on in all cases:
>
>bus 533Mhz memory 133Mhz (DDR SDRAM cas2)
>single cpu P4 3.105Ghz (bus 135 Mhz by default, not 133) : 101394
>single cpu P4 3.105Ghz now 2 processes DIEP              : 120095
>
>So speedup like 18% for HT. Not bad. Not good either, knowing diep
>hardly locks.

It isn't just a lock issue.  If both threads are banging on memory, it can't
run much faster, as it still serializes the memory reads and they are slow.


Hmm.. Aren't you the same person that was saying "hyper-threading doesn't
work" and "hyper-threading only works on machines that won't be available
for 1-2 years in the future"??  And that "Nalimov is running on a machine
that nobody can buy"  and "the 2.8ghz xeon doesn't support hyper-threading"?
and so forth???



>
>However there is 1 problem i have with it when i compare that speed
>of the same version with 2.4Ghz northwood.
>
>That 2.4Ghz is exactly the speed of a K7 at 1.6ghz
>
>Now the same K7 same version logs:
>    single cpu : 82499
>    dual       : 154293
>
>Note that the k7 has way way slower RAM and chipset. 133Mhz registered cas 2.5
>i guess versus fast cas 2 (like 2T less for latency, so 10 versus 12T or
>something) for the P4. The P4 was a single cpu.
>
>but here the math for those who still read here that's interesting to
>hear.
>
>Single cpu speed difference is:
>  P4 3.06Ghz is faster : 22.9%
>
>Based upon the speed where it is clocked at (3105Mhz)
>we would expect a speedup of 3.105 / 2.4 = 29.4%
>
>So somehow we lose around 7% in the process.

Memory is no faster.  So there is going to be a loss every time the cpu clock is
ramped up a notch.  Always has been.  Always will be until DRAM disappears.

>
>Now it wins another 18% or so when it gets run with 2 processes.
>If i compare that with a single cpu K7 to get the relative
>speed of a P4 Ghz versus a K7 Ghz then we get next compare:
>
>1.6Ghz * (120k / 82k) = 2.33Ghz
>
>so a 2.33Ghz K7 should be equally fast to a P4 at such a speed.
>Of course assuming linearly scaling.
>
>Now we calculate what 1Ghz K7 compares to in speed with P4: 1.33
>
>So DDR ram proves to be the big winner for the P4. SMT in itself
>is just a trick that works for me because my parallellism is
>pretty ok and most likely not for everyone.

Works just fine for me too, as I have already reported and as has Eugene
and others...

>
>Now of course it's questionable whether that 18% speedup in nodes
>a second also results in actual positive speedup in plydepth.
>
>For DIEP it is, but it's not so impressive at all.

Nope.  but that is not a processor issue, that is a search issue.  The _cpu_ is
faster with SMT on.  Just because a chess engine can't use that very well
doesn't mean
that other applications without the search overhead issue won't benefit, and in
fact
they do benefit pretty well...

The interesting thing I have noted is that the SMT benefit just about offsets my
parallel
search overhead for the typical case.   If I run a single thread on my 2.8 xeon,
I get a search
time of X.  If I run four threads, to use both cpus with HT enabled, I get a
search time of
very close to X/2.  The 20-30% speedup by HT is just about what it takes to
offset the
extra search overhead caused by the extra processor.  Which means that for the
time being,
it is possible to search almost exactly twice as fast using two cpus, although
this comparison
is not exactly a correct way to compare things.

>
>Because a dual Xeon 2.8Ghz which i will assume also having a compare
>of 1.4 then (assuming not cas2 ddr ram but of course ecc registered
>which eats extra time)

However the xeon has 2-way memory interleaving which runs the bandwidth way up
compared to the desktop PIV system.


>
>That means that the equivalent K7 will be a dual K7 2.0Ghz, thereby
>still not taking into account 3 things
>
>  a) my diep version was msvc compiled with processorpack (sp4)
>     so it was simply not optimized for K7 at all, but more for p4
>     than it was optimized for K7. Not using MMX of course (would
>     slow down on P4 and let the K7 look relatively better).
>  b) speedup at 4 processors is a lot worse than at 2 processors
>     so when i run diep with 4 processes at the dual Xeon 2.8
>     the expectation is that the K7 dual 2.0 Ghz will outgun it
>     by quite some margin.
>  c) that dual k7 2.0Ghz is less than half the price of a dual P4 2.8Ghz
>

There are no dual PIV's at the moment.  Only dual xeons.  Xeons are _not_
PIV's....  For several reasons that can be found on the Intel web site.  That's
why
xeons are considered to be their "server class chips" while the PIV is their
"desktop
class chip".



>Best regards,
>Vincent



This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.