Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The need to unmake move

Author: Vincent Diepeveen

Date: 04:04:33 09/04/03

Go up one level in this thread


On September 03, 2003 at 16:44:10, Robert Hyatt wrote:

>On September 03, 2003 at 15:35:01, Vincent Diepeveen wrote:
>
>>On September 03, 2003 at 14:10:53, Robert Hyatt wrote:
>>
>>>On September 03, 2003 at 13:33:19, Gian-Carlo Pascutto wrote:
>>>
>>>>On September 03, 2003 at 13:22:50, Robert Hyatt wrote:
>>>>
>>>>>It is a _TINY_ part of the total time spent.  So tiny, it can be ignored.
>>>>
>>>>Que?
>>>>
>>>>Maybe so on an SMP quad (as I stated), but surely not on a large NUMA system.
>>>>
>>>>If this isn't the issue, I'd expect my thing to run like the blazes
>>>>on a NUMA box, but I doubt I'm that lucky.
>>>>
>>>>--
>>>>GCP
>>>
>>>
>>>There are three things that have to be done by a thread:
>>>
>>>1.  copy local data somewhere else for another thread to use (splitting in
>>>crafty terminology).  That happens once per "split".  How many splits are done?
>>>
>>>Here is the data I provided in another thread here...
>>>
>>>              SMP->  split=6266  stop=875  data=19/64  cpu=10:00  elap=2:39
>>>              SMP->  split=3511  stop=440  data=16/64  cpu=5:20  elap=1:27
>>>              SMP->  split=3768  stop=524  data=17/64  cpu=5:45  elap=1:33
>>>              SMP->  split=1724  stop=275  data=13/64  cpu=3:59  elap=1:04
>>>              SMP->  split=4894  stop=671  data=15/64  cpu=3:55  elap=1:03
>>>              SMP->  split=2666  stop=420  data=15/64  cpu=3:51  elap=1:02
>>>              SMP->  split=3412  stop=683  data=17/64  cpu=3:46  elap=1:00
>>>              SMP->  split=3447  stop=476  data=15/64  cpu=3:55  elap=1:03
>>>              SMP->  split=2985  stop=345  data=19/64  cpu=1:13  elap=19.53
>>>              SMP->  split=11657  stop=1620  data=23/64  cpu=3:32  elap=58.12
>>>              SMP->  split=1928  stop=292  data=17/64  cpu=3:24  elap=57.08
>>>              SMP->  split=53912  stop=6999  data=30/64  cpu=32:06  elap=8:42
>>>              SMP->  split=9997  stop=1209  data=23/64  cpu=3:31  elap=56.69
>>>              SMP->  split=2966  stop=527  data=19/64  cpu=3:28  elap=55.49
>>>
>>>Worst case was 54000 splits for a 9 minute long search.  Using 4 processors.
>>>More typical seems to be about 500 splits per minute of search.  That is
>>>not much time.
>>
>>05:11 <nps censored> 0 0 487383460 (130) 14 (85565,1592299) 0.001 d2-d4 Ng8-f6
>>Ng1-f3 d7-d5 Bc1-f4 e7-e6 e2-e3 Bf8-d6 Bf1-e2 Nb8-c6 O-O Bd6xf4 e3xf4 O-O
>>
>>1.59MLN splits / 311 seconds = 5119 splits a second
>>Or that's 39 splits a second a processor.
>>
>>Of course in crafty you limit the number of splits bigtime by the conditions
>>used.
>
>Yes, but I can still drive 4 cpus to good utilization.  If I limit splits,
>that utilization goes down significantly.  Some samples (Notice I _always_
>give real data rather than waving my hands):
>
>split with N plies remaining         cpu utilization        elapsed time
>     N        splits done
>
>     1           10203                      395%                 30.3s
>     2            5017                      386%                 28.1s
>     3            1982                      385%                 28.2s
>     4            1011                      385%                 29.7s
>     5             677                      377%                 27.8s
>     6             352                      364%                 28.5s
>
>The fastest setting for this particular position seems to be smpmin=5,
>where the default is 4.  But over many tests, smpmin=4 seems to be the
>right value for this version of crafty, this hardware.
>
>I hardly call that "limiting the number of splits done big-time".  N=1
>means I split at the last ply and call q-search in parallel.  N=2 means I
>split at the next to last ply.  Ditto through N=6.  This was a search to
>depth=13 in a middlegame position, for reference.
>
>
>
>>
>>But the more splits a second the better the speedup according to my
>>measurements.
>
>Depends.  Splits near the tip are not as good as move ordering at the
>tips is worse than move ordering farther up into the tree.
>
>>
>>When i split dual 10 times a second a cpu, then the speedup is like 1.7 like
>>crafty.
>
>That's the first time I have seen you use 1.7 for Crafty.  Usually it is
>1.0 or 1.2.  Finally coming back to the real world after testing some?

it depends upon how you measure.

you always have stuff that cripples its play but is good for speedup.
remember that i tested crafty at a dual k7 with asymmetric king safety turned
off.

if you would print out the objective node counts at each main variation then we
directly know where we talk about. Crafty search is too inefficient to take its
parallel search serious.

>
>
>
>
>>
>>When i let diep split 30 or 40 times a second then speedup is 1.9 to 2.0
>
>
>Or > 2, no doubt if you split 100 times a second...  :)
>
>
>
>>
>>Thank you,
>>Vincent
>>
>>>2.  Search.  Here I only do local memory accesses, so there is just normal
>>>tree search overhead, nothing related to NUMA.
>>>
>>>3.  completion.  Here I have to either copy a score/PV or just score back to
>>>the parent thread data or set a "stop" flag to say my result is good enough,
>>>no others are needed.  Either of these is a trivial amount of non-local memory
>>>traffic.
>>>
>>>If you do that right, NUMA should not hurt.  The issue is going to become
>>>how to use a large number of processors, which is much harder to do that
>>>to use a small number as we are today.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.