Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: The need to unmake move

Author: Robert Hyatt
Date: 08:12:40 09/04/03
On September 04, 2003 at 07:04:33, Vincent Diepeveen wrote:

>On September 03, 2003 at 16:44:10, Robert Hyatt wrote:
>
>>On September 03, 2003 at 15:35:01, Vincent Diepeveen wrote:
>>
>>>On September 03, 2003 at 14:10:53, Robert Hyatt wrote:
>>>
>>>>On September 03, 2003 at 13:33:19, Gian-Carlo Pascutto wrote:
>>>>
>>>>>On September 03, 2003 at 13:22:50, Robert Hyatt wrote:
>>>>>
>>>>>>It is a _TINY_ part of the total time spent.  So tiny, it can be ignored.
>>>>>
>>>>>Que?
>>>>>
>>>>>Maybe so on an SMP quad (as I stated), but surely not on a large NUMA system.
>>>>>
>>>>>If this isn't the issue, I'd expect my thing to run like the blazes
>>>>>on a NUMA box, but I doubt I'm that lucky.
>>>>>
>>>>>--
>>>>>GCP
>>>>
>>>>
>>>>There are three things that have to be done by a thread:
>>>>
>>>>1.  copy local data somewhere else for another thread to use (splitting in
>>>>crafty terminology).  That happens once per "split".  How many splits are done?
>>>>
>>>>Here is the data I provided in another thread here...
>>>>
>>>>              SMP->  split=6266  stop=875  data=19/64  cpu=10:00  elap=2:39
>>>>              SMP->  split=3511  stop=440  data=16/64  cpu=5:20  elap=1:27
>>>>              SMP->  split=3768  stop=524  data=17/64  cpu=5:45  elap=1:33
>>>>              SMP->  split=1724  stop=275  data=13/64  cpu=3:59  elap=1:04
>>>>              SMP->  split=4894  stop=671  data=15/64  cpu=3:55  elap=1:03
>>>>              SMP->  split=2666  stop=420  data=15/64  cpu=3:51  elap=1:02
>>>>              SMP->  split=3412  stop=683  data=17/64  cpu=3:46  elap=1:00
>>>>              SMP->  split=3447  stop=476  data=15/64  cpu=3:55  elap=1:03
>>>>              SMP->  split=2985  stop=345  data=19/64  cpu=1:13  elap=19.53
>>>>              SMP->  split=11657  stop=1620  data=23/64  cpu=3:32  elap=58.12
>>>>              SMP->  split=1928  stop=292  data=17/64  cpu=3:24  elap=57.08
>>>>              SMP->  split=53912  stop=6999  data=30/64  cpu=32:06  elap=8:42
>>>>              SMP->  split=9997  stop=1209  data=23/64  cpu=3:31  elap=56.69
>>>>              SMP->  split=2966  stop=527  data=19/64  cpu=3:28  elap=55.49
>>>>
>>>>Worst case was 54000 splits for a 9 minute long search.  Using 4 processors.
>>>>More typical seems to be about 500 splits per minute of search.  That is
>>>>not much time.
>>>
>>>05:11 <nps censored> 0 0 487383460 (130) 14 (85565,1592299) 0.001 d2-d4 Ng8-f6
>>>Ng1-f3 d7-d5 Bc1-f4 e7-e6 e2-e3 Bf8-d6 Bf1-e2 Nb8-c6 O-O Bd6xf4 e3xf4 O-O
>>>
>>>1.59MLN splits / 311 seconds = 5119 splits a second
>>>Or that's 39 splits a second a processor.
>>>
>>>Of course in crafty you limit the number of splits bigtime by the conditions
>>>used.
>>
>>Yes, but I can still drive 4 cpus to good utilization.  If I limit splits,
>>that utilization goes down significantly.  Some samples (Notice I _always_
>>give real data rather than waving my hands):
>>
>>split with N plies remaining         cpu utilization        elapsed time
>>     N        splits done
>>
>>     1           10203                      395%                 30.3s
>>     2            5017                      386%                 28.1s
>>     3            1982                      385%                 28.2s
>>     4            1011                      385%                 29.7s
>>     5             677                      377%                 27.8s
>>     6             352                      364%                 28.5s
>>
>>The fastest setting for this particular position seems to be smpmin=5,
>>where the default is 4.  But over many tests, smpmin=4 seems to be the
>>right value for this version of crafty, this hardware.
>>
>>I hardly call that "limiting the number of splits done big-time".  N=1
>>means I split at the last ply and call q-search in parallel.  N=2 means I
>>split at the next to last ply.  Ditto through N=6.  This was a search to
>>depth=13 in a middlegame position, for reference.
>>
>>
>>
>>>
>>>But the more splits a second the better the speedup according to my
>>>measurements.
>>
>>Depends.  Splits near the tip are not as good as move ordering at the
>>tips is worse than move ordering farther up into the tree.
>>
>>>
>>>When i split dual 10 times a second a cpu, then the speedup is like 1.7 like
>>>crafty.
>>
>>That's the first time I have seen you use 1.7 for Crafty.  Usually it is
>>1.0 or 1.2.  Finally coming back to the real world after testing some?
>
>it depends upon how you measure.
>
>you always have stuff that cripples its play but is good for speedup.
>remember that i tested crafty at a dual k7 with asymmetric king safety turned
>off.
>

SO?  asymmetric king safety does _not_ make it run better or worse in parallel.
Perhaps for a position or two here and there, one or the other is much better.
But overall, no.  I've _already_ run that test and posted the results.


>if you would print out the objective node counts at each main variation then we
>directly know where we talk about. Crafty search is too inefficient to take its
>parallel search serious.

Works better than yours however.  _I_ don't have mysterious crashes and bugs
all over the place.  That "inefficient search" sure seems to give you plenty
of problems on ICC.



>
>>
>>
>>
>>
>>>
>>>When i let diep split 30 or 40 times a second then speedup is 1.9 to 2.0
>>
>>
>>Or > 2, no doubt if you split 100 times a second...  :)
>>
>>
>>
>>>
>>>Thank you,
>>>Vincent
>>>
>>>>2.  Search.  Here I only do local memory accesses, so there is just normal
>>>>tree search overhead, nothing related to NUMA.
>>>>
>>>>3.  completion.  Here I have to either copy a score/PV or just score back to
>>>>the parent thread data or set a "stop" flag to say my result is good enough,
>>>>no others are needed.  Either of these is a trivial amount of non-local memory
>>>>traffic.
>>>>
>>>>If you do that right, NUMA should not hurt.  The issue is going to become
>>>>how to use a large number of processors, which is much harder to do that
>>>>to use a small number as we are today.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.