Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Here are some actual numbers

Author: Vincent Diepeveen

Date: 14:15:41 04/14/03

Go up one level in this thread


On April 13, 2003 at 22:39:39, Robert Hyatt wrote:

>On April 13, 2003 at 11:49:28, Vincent Diepeveen wrote:
>
>>On April 13, 2003 at 11:27:53, Robert Hyatt wrote:
>>
>>I said initially. It drops back to 10 splits a second in DIEP after a while.
>>Search depth matters.
>>
>>Let's compare 2 things.
>>
>> time=45.98  cpu=464%  mat=0  n=37870294  fh=88%  nps=823k
>> ext-> chk=638414 cap=249442 pp=9588 1rep=32966 mate=223
>> predicted=0  nodes=37870294  evals=14565859
>> endgame tablebase-> probes done=0  successful=0
>> hashing-> trans/ref=28%  pawn=93%  used=28%
>> SMP->  split=431  stop=57  data=6/64  cpu=3:33  elap=45.98
>>
>>MT 2  crafty 18.10 which i have here. 431 splits at 45 seconds. I guess you must
>>limit in crafty the number of splits a lot as splitting is expensive in crafty
>>when compared to the costs of a single node.
>
>I'm not sure how expensive it is compared to a node.  I'll run a test where
>I do the split overhead at every node to compare, however...
>
>
>
>I don't limit them at all.  The only limit is the YBW algorithm.  But I split
>at the root also, which reduces them signficantly...

I can split at the root nowadays, but i have turned it off for diep. it gives
too poor speedup for me. The interesting thing which searching SMP can give is
transpositions at a big depth which possibly are overwritten by a sequential
search. i don't want to miss them.

As i showed half a year ago the chance is a bigger with SMP 2 threads/processes
that the chance that a transposition cutoff occurs with a depthleft a slighly
bit bigger on average than when doing deep sequential searches (of course
hashtable needs to be able to get filled quite some, but under practical
tournament conditions this is the case in most programs).

I will however again experiment with splitting in root with a 128 processor run,
when this works very well. Not to reduce number of splits so much but to get the
cpu's sooner non-idling (where idling as we know is not really idling at all).

128 cpu runs of 10 minutes are not too expensive. 1280 minutes / 60 = 21 cpu
hour. Of course the only hard thing is when you are unlucky with a run (each run
can be different and perhaps one time you have a very poor run which gives a bad
speedup, where reality is it would give a better speedup).

Anyway splitting in root doesn't work for me with 2-16 cpu's.

Best regards,
Vincent

>
>>
>>Let's ignore the cpu=464% i do not understand why it says that. I have it at
>>mt=2. probably small i/o bug.
>>
>>Now let's diep search for around this time:
>>
>>Took 0.12 seconds to start all 1 other processes out of 2
>>00:00     21   0k 0 0 21 (2) 2 (0,0) -0.022 Ng1-f3 d7-d5
>>++ d2-d4 procnr=0 terug=1 org=[-22;-21] newwindow=[-22;520000]
>>00:00     71   0k 0 0 71 (2) 2 (0,0) 0.001 d2-d4 d7-d5
>>00:00    175   0k 0 0 175 (2) 3 (0,2) 0.157 d2-d4 d7-d5 Ng1-f3
>>00:00    443   0k 0 0 443 (2) 4 (0,5) 0.001 d2-d4 d7-d5 Ng1-f3 Ng8-f6
>>00:00 150800 151k 0 0 1508 (2) 5 (0,19) 0.190 d2-d4 d7-d5 Ng1-f3 Ng8-f6 Nb1-c3
>>00:00 318900 319k 0 0 3189 (2) 6 (0,27) 0.001 d2-d4 d7-d5 Ng1-f3 Ng8-f6 Nb1-c3 N
>>b8-c6
>>00:00 149744 150k 0 0 13477 (2) 7 (3,68) 0.179 d2-d4 d7-d5 Ng1-f3 Ng8-f6 Bc1-f4
>>Nf6-h5 Bf4-g5
>>00:00 136110 136k 0 0 27222 (2) 8 (6,147) 0.001 d2-d4 d7-d5 Ng1-f3 Ng8-f6 Bc1-f4
>> Nf6-h5 Bf4-g5 Nb8-c6
>>00:01 127109 127k 0 0 205917 (2) 9 (45,502) 0.105 d2-d4 Ng8-f6 Nb1-c3 Nb8-c6 Bc1
>>-f4 d7-d6 Ng1-f3 Bc8-f5 e2-e3
>>00:04 127013 127k 0 0 572829 (2) 10 (76,666) 0.001 d2-d4 Ng8-f6 Nb1-c3 d7-d5 Bc1
>>-f4 Bc8-f5 Ng1-f3 Nb8-c6 Nf3-e5 Nf6-e4
>>00:17 152655 153k 0 0 2648566 (2) 11 (330,1980) 0.108 d2-d4 d7-d5 Ng1-f3 Nb8-c6
>>Nb1-c3 Bc8-f5 Nf3-h4 Bf5-c8 Bc1-g5 Ng8-f6 e2-e3
>>00:38 154041 154k 0 0 5889009 (2) 12 (743,4189) 0.008 d2-d4 d7-d5 Bc1-f4 Bc8-f5
>>Ng1-f3 Ng8-f6 Nb1-c3 Nb8-c6 Nc3-b5 Ra8-c8 Nf3-e5 Nc6xe5 d4xe5
>>
>>Of course if i use same conditions like crafty when to split then it will look
>>different with regards to the number of splits performed.
>>
>>Splitting in diep is very cheap. I already split >= 2 ply left searches and i
>>split quickly in current versions.
>
>I split everywhere.  It is possible to limit this and I think the current
>version avoids splitting at the last 2-3 plies of the tree.  I haven't tested
>this on my dual to see if the current value is correct, however...
>
>
>> The reason is that you get 500 cpu's quicker
>>busy and find bugs sooner. No doubt in future i will again optimize it to a
>>state where it will optimize search depth more at x86. If that's with many
>>splits a second at 2-4 processes, then i'll go for that. If it is with less
>>splits a second i'll go for that.
>>
>>Note that the 4189 number at 12 ply is not the number of splits only, it is the
>>total number of searches. So about 11*20 + 1 = 220 + 1 = 221 are from searching
>>the root.
>>
>>>On April 13, 2003 at 08:32:37, Vincent Diepeveen wrote:
>>>
>>>>On April 13, 2003 at 08:21:42, Vincent Diepeveen wrote:
>>>>
>>>>>On April 13, 2003 at 02:37:57, Tom Kerrigan wrote:
>>>>>
>>>>>>On April 13, 2003 at 01:04:52, Robert Hyatt wrote:
>>>>>>
>>>>>>>It _is_ pinned on SMT.  The two logical processors are producing wildly
>>>>>>>imbalanced results when using threads, vs using two separate processes.  It
>>>>>>>would appear to be cache-related...
>>>>>>
>>>>>>This is some sort of joke, right? You and Vincent see the same behavior, you
>>>>>>have SMT and Vincent doesn't, and somehow the problem is with SMT?
>>>>>>
>>>>>>How much of the time are your threads idle, out of curiosity? If one thread is
>>>>>>idle much more than the other, then of course that is going to skew your NPS.
>>>>>>
>>>>>>-Tom
>>>>>
>>>>>Of course both Crafty and DIEP are using YBW. I didn't checkout what bob does
>>>>>here, but in past in DIEP i used to always let process 0 let the search start.
>>>>>Nowadays that is not the case. The i/o thread picks the first process it can
>>>>>get. All search processes are completely identical. This process then is
>>>>>starting the search. That means the other CPUs idle when this process starts the
>>>>>search.
>>>>
>>>>also read that 'idle' not in litterary sense. Letting them REALLY idle with
>>>>sleep() or WaitForSingleObject, is at a REAL smp system (like dual K7) just too
>>>>expensive. Latency to wake up processors is at sick high levels. 15 ms just like
>>>>that. Imagine that because of the YBW search, you have to split initially like
>>>>50-100 times a second. 15ms is death sentence. So 'idle' cpu's are spinning
>>>>around until at a shared memory variable some flag is set. I let them do some
>>>>arithmetic function for a 100 times while 'idling'.
>>>
>>>If you do this right you won't split _that_ often.
>>>
>>>              time=35.97  cpu=381%  mat=-1  n=80006982  fh=92%  nps=2224k
>>>              ext-> chk=1487513 cap=353299 pp=32860 1rep=79236 mate=15135
>>>              predicted=3  nodes=80006982  evals=19493470
>>>              endgame tablebase-> probes done=0  successful=0
>>>              SMP->  split=1840  stop=163  data=15/64  cpu=2:17  elap=35.97
>>>              time used:  29.81
>>>
>>>
>>>In the above from a game on ICC, in 35 seconds, I did 1800 splits total.  The
>>>deeper the search the better this becomes...
>>>
>>>              time=2:33  cpu=396%  mat=0  n=282753699  fh=91%  nps=1840k
>>>              ext-> chk=3046093 cap=1083298 pp=16735 1rep=192964 mate=3400
>>>              predicted=8  nodes=282753699  evals=114936261
>>>              endgame tablebase-> probes done=0  successful=0
>>>              SMP->  split=2683  stop=424  data=15/64  cpu=10:09  elap=2:33
>>>              time used:   8.29
>>>
>>>              time=4:03  cpu=396%  mat=0  n=466004128  fh=90%  nps=1911k
>>>              ext-> chk=3120074 cap=1773259 pp=60704 1rep=227466 mate=5595
>>>              predicted=9  nodes=466004128  evals=160300467
>>>              endgame tablebase-> probes done=0  successful=0
>>>              SMP->  split=5811  stop=950  data=18/64  cpu=16:06  elap=4:03
>>>              time used:   2:43
>>>
>>>              time=3:47  cpu=396%  mat=0  n=421757405  fh=92%  nps=1855k
>>>              ext-> chk=3436512 cap=1222511 pp=75583 1rep=186606 mate=3165
>>>              predicted=12  nodes=421757405  evals=149496490
>>>              endgame tablebase-> probes done=0  successful=0
>>>              SMP->  split=3524  stop=337  data=17/64  cpu=15:01  elap=3:47
>>>
>>>>
>>>>>In crafty that's also the case, but i do not know whether Bob always picks a
>>>>>certain thread as first. If so then that might explain quite something.
>>>>>
>>>>>Measuring idle time with SMT is very hard to do objective, but of course you can
>>>>>relatively check it out. Basically the problem is you do not know what the
>>>>>maximum % is that i can get out of SMT, because it is dependant upon the other
>>>>>process too.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.