Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Here are some actual numbers

Author: Robert Hyatt
Date: 11:26:15 04/16/03
On April 16, 2003 at 07:54:40, Vincent Diepeveen wrote:

>On April 16, 2003 at 00:05:02, Robert Hyatt wrote:
>
>Try searching all the knuth root moves with window [-infinite;infinite]
>

For what reason??


>Then you'll know soon.
>
>Using the wrong alfa beta values in root, and in crafty you are not aborting
>either when it has wrong alfa beta values in contradiction to what i do in DIEP,
>will take longer than needed.

I don't know what that means (not aborting).  When I fail high _or_ low, I abort
_instantly_ and re-set the alpha/beta window in the correct direction.


>
>Note that knuth also showed that the minimum tree is about equal to the square
>root of the total search space. That comes down to the branching factor equal to
>sqrt(#semilegal moves) = sqrt(40) = 6.32
>
>I assume you have that branching factor with crafty too, because Knuth says so?

Nope.  Knuth's tree size doesn't apply to my trees because of null-move.
Everything _else_
he says is correct about alpha/beta in general, however.  Including which moves
need to be
searched.

And by demonstration, I can show that splitting at the root produces good
results, _when_
done reasonably.  Done haphazardly, anything can happen, of course.



>
>>On April 15, 2003 at 08:56:54, Vincent Diepeveen wrote:
>>
>>>On April 14, 2003 at 17:43:12, Robert Hyatt wrote:
>>>
>>>>On April 14, 2003 at 17:15:41, Vincent Diepeveen wrote:
>>>>
>>>>>On April 13, 2003 at 22:39:39, Robert Hyatt wrote:
>>>>>
>>>>>>On April 13, 2003 at 11:49:28, Vincent Diepeveen wrote:
>>>>>>
>>>>>>>On April 13, 2003 at 11:27:53, Robert Hyatt wrote:
>>>>>>>
>>>>>>>I said initially. It drops back to 10 splits a second in DIEP after a while.
>>>>>>>Search depth matters.
>>>>>>>
>>>>>>>Let's compare 2 things.
>>>>>>>
>>>>>>> time=45.98  cpu=464%  mat=0  n=37870294  fh=88%  nps=823k
>>>>>>> ext-> chk=638414 cap=249442 pp=9588 1rep=32966 mate=223
>>>>>>> predicted=0  nodes=37870294  evals=14565859
>>>>>>> endgame tablebase-> probes done=0  successful=0
>>>>>>> hashing-> trans/ref=28%  pawn=93%  used=28%
>>>>>>> SMP->  split=431  stop=57  data=6/64  cpu=3:33  elap=45.98
>>>>>>>
>>>>>>>MT 2  crafty 18.10 which i have here. 431 splits at 45 seconds. I guess you must
>>>>>>>limit in crafty the number of splits a lot as splitting is expensive in crafty
>>>>>>>when compared to the costs of a single node.
>>>>>>
>>>>>>I'm not sure how expensive it is compared to a node.  I'll run a test where
>>>>>>I do the split overhead at every node to compare, however...
>>>>>>
>>>>>>
>>>>>>
>>>>>>I don't limit them at all.  The only limit is the YBW algorithm.  But I split
>>>>>>at the root also, which reduces them signficantly...
>>>>>
>>>>>I can split at the root nowadays, but i have turned it off for diep. it gives
>>>>>too poor speedup for me. The interesting thing which searching SMP can give is
>>>>>transpositions at a big depth which possibly are overwritten by a sequential
>>>>>search. i don't want to miss them.
>>>>
>>>>Maybe you don't split at the root correctly.  I limit this with some intelligent
>>>>guesswork, so that if it appears that I might change my mind this iteration,
>>>>then
>>>>I don't split at the root until I have searched all moves that I think might
>>>>replace
>>>>the best move...
>>>
>>>i don't have a bug there. With just 2 or 4 cpu's you can split at so many points
>>>that i chose to not split at the root. It gives a bad speedup. Not near 2.0 to
>>>be precise.
>>
>>Sorry, but that is wrong.  The root is the _perfect_ place to split to avoid
>>search overhead, because we _know_ that every root move must be searched, while
>>we can't say that with 100% accuracy about any _other_ position in the tree...
>>
>>See Knuth/Moore, "An analysis of alpha/beta pruning"
>>
>>
>>>
>>>However if splitting is as expensive as it is in crafty i can very well
>>>understand you do it.
>>
>>It's not done because it is too expensive to not split at the root.  It is
>>done (by me) because it is more _efficient_ to do so...  The math is simple.
>>
>>
>>>
>>>>
>>>>>
>>>>>As i showed half a year ago the chance is a bigger with SMP 2 threads/processes
>>>>>that the chance that a transposition cutoff occurs with a depthleft a slighly
>>>>>bit bigger on average than when doing deep sequential searches (of course
>>>>>hashtable needs to be able to get filled quite some, but under practical
>>>>>tournament conditions this is the case in most programs).
>>>>>
>>>>>I will however again experiment with splitting in root with a 128 processor run,
>>>>>when this works very well. Not to reduce number of splits so much but to get the
>>>>>cpu's sooner non-idling (where idling as we know is not really idling at all).
>>>>>
>>>>>128 cpu runs of 10 minutes are not too expensive. 1280 minutes / 60 = 21 cpu
>>>>>hour. Of course the only hard thing is when you are unlucky with a run (each run
>>>>>can be different and perhaps one time you have a very poor run which gives a bad
>>>>>speedup, where reality is it would give a better speedup).
>>>>>
>>>>>Anyway splitting in root doesn't work for me with 2-16 cpu's.
>>>>
>>>>As I said, you have to think about it.  There are ways to make it work, and it
>>>>lowers overhead drastically when it is done correctly.  (search overhead goes
>>>>down).
>>>
>>>i have it to work, it just doesn't give a very good speedup at all, that's why i
>>>do not do it.
>>>
>>>Right now there is a simple extra 'if' condition
>>>
>>> if( .. && realply > 1 ..
>>>
>>>If i kick it out then it splits at the root.
>>>
>>
>>That's not good enough, as I said..
>>
>>There is _more_ to consider when making the decision...
>>
>>
>>>
>>>
>>>>>
>>>>>Best regards,
>>>>>Vincent
>>>>>
>>>>>>
>>>>>>>
>>>>>>>Let's ignore the cpu=464% i do not understand why it says that. I have it at
>>>>>>>mt=2. probably small i/o bug.
>>>>>>>
>>>>>>>Now let's diep search for around this time:
>>>>>>>
>>>>>>>Took 0.12 seconds to start all 1 other processes out of 2
>>>>>>>00:00     21   0k 0 0 21 (2) 2 (0,0) -0.022 Ng1-f3 d7-d5
>>>>>>>++ d2-d4 procnr=0 terug=1 org=[-22;-21] newwindow=[-22;520000]
>>>>>>>00:00     71   0k 0 0 71 (2) 2 (0,0) 0.001 d2-d4 d7-d5
>>>>>>>00:00    175   0k 0 0 175 (2) 3 (0,2) 0.157 d2-d4 d7-d5 Ng1-f3
>>>>>>>00:00    443   0k 0 0 443 (2) 4 (0,5) 0.001 d2-d4 d7-d5 Ng1-f3 Ng8-f6
>>>>>>>00:00 150800 151k 0 0 1508 (2) 5 (0,19) 0.190 d2-d4 d7-d5 Ng1-f3 Ng8-f6 Nb1-c3
>>>>>>>00:00 318900 319k 0 0 3189 (2) 6 (0,27) 0.001 d2-d4 d7-d5 Ng1-f3 Ng8-f6 Nb1-c3 N
>>>>>>>b8-c6
>>>>>>>00:00 149744 150k 0 0 13477 (2) 7 (3,68) 0.179 d2-d4 d7-d5 Ng1-f3 Ng8-f6 Bc1-f4
>>>>>>>Nf6-h5 Bf4-g5
>>>>>>>00:00 136110 136k 0 0 27222 (2) 8 (6,147) 0.001 d2-d4 d7-d5 Ng1-f3 Ng8-f6 Bc1-f4
>>>>>>> Nf6-h5 Bf4-g5 Nb8-c6
>>>>>>>00:01 127109 127k 0 0 205917 (2) 9 (45,502) 0.105 d2-d4 Ng8-f6 Nb1-c3 Nb8-c6 Bc1
>>>>>>>-f4 d7-d6 Ng1-f3 Bc8-f5 e2-e3
>>>>>>>00:04 127013 127k 0 0 572829 (2) 10 (76,666) 0.001 d2-d4 Ng8-f6 Nb1-c3 d7-d5 Bc1
>>>>>>>-f4 Bc8-f5 Ng1-f3 Nb8-c6 Nf3-e5 Nf6-e4
>>>>>>>00:17 152655 153k 0 0 2648566 (2) 11 (330,1980) 0.108 d2-d4 d7-d5 Ng1-f3 Nb8-c6
>>>>>>>Nb1-c3 Bc8-f5 Nf3-h4 Bf5-c8 Bc1-g5 Ng8-f6 e2-e3
>>>>>>>00:38 154041 154k 0 0 5889009 (2) 12 (743,4189) 0.008 d2-d4 d7-d5 Bc1-f4 Bc8-f5
>>>>>>>Ng1-f3 Ng8-f6 Nb1-c3 Nb8-c6 Nc3-b5 Ra8-c8 Nf3-e5 Nc6xe5 d4xe5
>>>>>>>
>>>>>>>Of course if i use same conditions like crafty when to split then it will look
>>>>>>>different with regards to the number of splits performed.
>>>>>>>
>>>>>>>Splitting in diep is very cheap. I already split >= 2 ply left searches and i
>>>>>>>split quickly in current versions.
>>>>>>
>>>>>>I split everywhere.  It is possible to limit this and I think the current
>>>>>>version avoids splitting at the last 2-3 plies of the tree.  I haven't tested
>>>>>>this on my dual to see if the current value is correct, however...
>>>>>>
>>>>>>
>>>>>>> The reason is that you get 500 cpu's quicker
>>>>>>>busy and find bugs sooner. No doubt in future i will again optimize it to a
>>>>>>>state where it will optimize search depth more at x86. If that's with many
>>>>>>>splits a second at 2-4 processes, then i'll go for that. If it is with less
>>>>>>>splits a second i'll go for that.
>>>>>>>
>>>>>>>Note that the 4189 number at 12 ply is not the number of splits only, it is the
>>>>>>>total number of searches. So about 11*20 + 1 = 220 + 1 = 221 are from searching
>>>>>>>the root.
>>>>>>>
>>>>>>>>On April 13, 2003 at 08:32:37, Vincent Diepeveen wrote:
>>>>>>>>
>>>>>>>>>On April 13, 2003 at 08:21:42, Vincent Diepeveen wrote:
>>>>>>>>>
>>>>>>>>>>On April 13, 2003 at 02:37:57, Tom Kerrigan wrote:
>>>>>>>>>>
>>>>>>>>>>>On April 13, 2003 at 01:04:52, Robert Hyatt wrote:
>>>>>>>>>>>
>>>>>>>>>>>>It _is_ pinned on SMT.  The two logical processors are producing wildly
>>>>>>>>>>>>imbalanced results when using threads, vs using two separate processes.  It
>>>>>>>>>>>>would appear to be cache-related...
>>>>>>>>>>>
>>>>>>>>>>>This is some sort of joke, right? You and Vincent see the same behavior, you
>>>>>>>>>>>have SMT and Vincent doesn't, and somehow the problem is with SMT?
>>>>>>>>>>>
>>>>>>>>>>>How much of the time are your threads idle, out of curiosity? If one thread is
>>>>>>>>>>>idle much more than the other, then of course that is going to skew your NPS.
>>>>>>>>>>>
>>>>>>>>>>>-Tom
>>>>>>>>>>
>>>>>>>>>>Of course both Crafty and DIEP are using YBW. I didn't checkout what bob does
>>>>>>>>>>here, but in past in DIEP i used to always let process 0 let the search start.
>>>>>>>>>>Nowadays that is not the case. The i/o thread picks the first process it can
>>>>>>>>>>get. All search processes are completely identical. This process then is
>>>>>>>>>>starting the search. That means the other CPUs idle when this process starts the
>>>>>>>>>>search.
>>>>>>>>>
>>>>>>>>>also read that 'idle' not in litterary sense. Letting them REALLY idle with
>>>>>>>>>sleep() or WaitForSingleObject, is at a REAL smp system (like dual K7) just too
>>>>>>>>>expensive. Latency to wake up processors is at sick high levels. 15 ms just like
>>>>>>>>>that. Imagine that because of the YBW search, you have to split initially like
>>>>>>>>>50-100 times a second. 15ms is death sentence. So 'idle' cpu's are spinning
>>>>>>>>>around until at a shared memory variable some flag is set. I let them do some
>>>>>>>>>arithmetic function for a 100 times while 'idling'.
>>>>>>>>
>>>>>>>>If you do this right you won't split _that_ often.
>>>>>>>>
>>>>>>>>              time=35.97  cpu=381%  mat=-1  n=80006982  fh=92%  nps=2224k
>>>>>>>>              ext-> chk=1487513 cap=353299 pp=32860 1rep=79236 mate=15135
>>>>>>>>              predicted=3  nodes=80006982  evals=19493470
>>>>>>>>              endgame tablebase-> probes done=0  successful=0
>>>>>>>>              SMP->  split=1840  stop=163  data=15/64  cpu=2:17  elap=35.97
>>>>>>>>              time used:  29.81
>>>>>>>>
>>>>>>>>
>>>>>>>>In the above from a game on ICC, in 35 seconds, I did 1800 splits total.  The
>>>>>>>>deeper the search the better this becomes...
>>>>>>>>
>>>>>>>>              time=2:33  cpu=396%  mat=0  n=282753699  fh=91%  nps=1840k
>>>>>>>>              ext-> chk=3046093 cap=1083298 pp=16735 1rep=192964 mate=3400
>>>>>>>>              predicted=8  nodes=282753699  evals=114936261
>>>>>>>>              endgame tablebase-> probes done=0  successful=0
>>>>>>>>              SMP->  split=2683  stop=424  data=15/64  cpu=10:09  elap=2:33
>>>>>>>>              time used:   8.29
>>>>>>>>
>>>>>>>>              time=4:03  cpu=396%  mat=0  n=466004128  fh=90%  nps=1911k
>>>>>>>>              ext-> chk=3120074 cap=1773259 pp=60704 1rep=227466 mate=5595
>>>>>>>>              predicted=9  nodes=466004128  evals=160300467
>>>>>>>>              endgame tablebase-> probes done=0  successful=0
>>>>>>>>              SMP->  split=5811  stop=950  data=18/64  cpu=16:06  elap=4:03
>>>>>>>>              time used:   2:43
>>>>>>>>
>>>>>>>>              time=3:47  cpu=396%  mat=0  n=421757405  fh=92%  nps=1855k
>>>>>>>>              ext-> chk=3436512 cap=1222511 pp=75583 1rep=186606 mate=3165
>>>>>>>>              predicted=12  nodes=421757405  evals=149496490
>>>>>>>>              endgame tablebase-> probes done=0  successful=0
>>>>>>>>              SMP->  split=3524  stop=337  data=17/64  cpu=15:01  elap=3:47
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>In crafty that's also the case, but i do not know whether Bob always picks a
>>>>>>>>>>certain thread as first. If so then that might explain quite something.
>>>>>>>>>>
>>>>>>>>>>Measuring idle time with SMT is very hard to do objective, but of course you can
>>>>>>>>>>relatively check it out. Basically the problem is you do not know what the
>>>>>>>>>>maximum % is that i can get out of SMT, because it is dependant upon the other
>>>>>>>>>>process too.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.