Author: Robert Hyatt
Date: 21:20:20 09/02/02
Go up one level in this thread
Ok... I compiled crafty using the same compiler, same everything, except
that the SMP version was compiled with -DSMP and -DCPUS=4 and the non-SMP
version was not.
I did a depth 13 search on a normal chess position...
Here is the statistics for the non-SMP version:
time=1:52 cpu=100% mat=0 n=42914531 fh=91% nps=382k
ext-> chk=341379 cap=125971 pp=14429 1rep=16160 mate=680
predicted=0 nodes=42914531 evals=11140060
endgame tablebase-> probes done=0 successful=0
and here is the stats for the SMP version:
time=1:52 cpu=99% mat=0 n=42914531 fh=91% nps=381k
ext-> chk=341379 cap=125971 pp=14429 1rep=16160 mate=680
predicted=0 nodes=42914531 evals=11140060
endgame tablebase-> probes done=0 successful=0
SMP-> split=0 stop=0 data=0/64 cpu=1:52 elap=1:52
Hmmm... 1K nodes per second slower for the SMP version. 382K vs 381K.
Now exactly what is the percentage difference??? Is that about .27% difference
that my calculator is showing? That is a _huge_ difference (your words)???
:)
I hate to confuse the discussion with _real_ data, but just for more
confusion, here is the 2-cpu run, same SMP version as above, run on my
quad xeon...
time=1:00 cpu=199% mat=0 n=43627532 fh=91% nps=722k
ext-> chk=338530 cap=128336 pp=13443 1rep=15836 mate=763
predicted=0 nodes=43627532 evals=11305256
endgame tablebase-> probes done=0 successful=0
SMP-> split=258 stop=21 data=5/64 cpu=2:00 elap=1:00
I'll do the math since you are incapable of doing it correctly.
In terms of raw NPS, the SMP version is 722/382 faster, taking the non-SMP
nps and dividing it into the 2-cpu SMP NPS...
I get 1.89X faster NPS.
In terms of pure speedup, the one-cpu test took 112 seconds, the 2-cpu
SMP version took 60 seconds.
I get a speedup of 1.86X there.
Sorry that doesn't agree with your nonsense. But _anybody_ can reproduce
those if they want...
I won't cloud things by running it with 4 cpus... Actually, I will cloud
things up further since I know you _hate_ to see real data. Here is the
run on the same single position, with 4 processors:
time=33.32 cpu=396% mat=0 n=45113849 fh=91% nps=1353k
ext-> chk=362181 cap=130161 pp=14737 1rep=18094 mate=740
predicted=0 nodes=45113849 evals=12087522
endgame tablebase-> probes done=0 successful=0
SMP-> split=1465 stop=134 data=13/64 cpu=2:12 elap=33.32
That speedup is what? 3.4X you say?
Can't be, can it? What about the NPS? Ack. 3.54 you get?
As I said it is far easier to actually _run_ something than to wave your hands
and claim something is true.
of course, I know that the above is the result of my slow 700mhz X 4 machine,
or the 100mhz FSB. Or the 1024K L2 cache. Or some other nonsense. And once
you "correct" the numbers due to the hardware issues and factor in a lot of
other handwaving, I'll be back to 1.0 or worse...
Fortunately, when we play OTB, my program will be running on _my_ hardware,
not some imagined hardware you have dreamed up...
Is there any more to say in this argument?
Do we need to also fix the Cray Blitz argument? You said that the NPS values
did not scale linearly from 2-4-8-16 cpus. And you asked me "what were those
other processors doing?" Why don't you go back and recompute the numbers you
posted, and see if you _really_ want to make that stupid statement again...
<wait a while>
I thought not...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.