Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty Cluster Feasible in Future?

Author: Robert Hyatt

Date: 07:43:11 02/20/04

Go up one level in this thread


On February 19, 2004 at 15:09:25, Dann Corbit wrote:

>On February 19, 2004 at 14:55:40, Robert Hyatt wrote:
>
>>On February 19, 2004 at 14:25:07, Dann Corbit wrote:
>>
>>>On February 19, 2004 at 14:14:11, Robert Hyatt wrote:
>>>[snip]
>>>>Clusters can work.  1000 nodes is a totally different problem from using 1-64,
>>>>but nothing I see says it is an impossible task.  Impossible to "some" perhaps
>>>>so.  Lots of things are impossible to "some".  Fortunately, for the rest of us,
>>>>what is "impossible" is something that just takes a bit longer to do... :)
>>>
>>>A cluster notion (e.g. 64000 machines maybe using the network):
>>>
>>>Divide the machines into clumps that work on forward nodes, and multiple
>>>possible best moves.
>>
>>I already do this on large SMP platforms.  IE crafty has a "group" concept, so
>>that with (say) 64 nodes, you don't get 64 nodes working on one split point.
>>That is bad for multiple reasons.  Usually there are not 64 moves to search so
>>the split overhead is just pure overhead.  Also, on NUMA boxes, you don't want
>>64 processors banging on one "split block" even though they don't do it that
>>much unless the split is too deep in the tree (this is also controllable in
>>current crafty).
>>
>>I suspect the same idea will be the right way for clusters.  Divide it into
>>groups, sub-groups, and so forth.  The "groups" have a "head" and the initial
>>split is between the "group heads".  Then each group head can further split its
>>tree farther down the path, with sub-group members.  Of course each of these
>>might be a "sub-group-head" that could later further split the tree among
>>sub-sub-group members.
>>
>>I think some such hierarchy is necessary for truly large numbers of processors,
>>and the potential for search overhead is going to be significant.
>
>Have one CPU that looks at the whole tree and makes recommendations (a sort of
>loadmaster).  In that way, the CPU pool does not get divided evenly.  So (in the
>first move example) moves like 1. e4, 1. d4, 1. Nc3 get 20,000 CPUs each and
>things like 1. a3 might only get 10 CPUs.
>
>This concept can also carry forward as far as you like.

I don't think that will work.  The reason is that there _are_ positions where a
move like a3 is the key move.  I don't think that some sort of static decision
can be made as to which move deserves the most (or least) processors.  It seems
to make more sense for this decision to be dynamic and based on what is actually
happening within the tree, not on some static characteristic of the individual
moves, which might be misleading.

My main goal was to avoid having too many processors work together, which causes
unnecessary overhead.  IE on the Cray C90/T90, with 16/32 processors, we had to
do the same sort of thing, otherwise in endgames, there are plenty of positions
with less than 32 or even less than 16 moves, and we didn't want the split
overhead there to go out of control..  However, on the cray it was a one-level
hierarchy, where it might be much more complex on a large cluster-type or
NUMA-type box that could have way more than 32 processors.

My feeling is that for any architecture, there is an "optimal" number of
processors that divide up a chunk of work and search the same tree in parallel.
You apply this "number" recursively in a divide-and-conquer approach to increase
the total number of processors "in the loop" without ever having more than some
finite number working on a particular node together.

I also have a depth limit on splits in Crafty, to control the split overhead,
which is not "free".  Every time I change hardware, I run tests to see if the
number should be changed, but since I left the quad pentium-pro 200 and changed
it then, it has not been changed since.  But again, it is an intent to adjust
the algorithm to the specific hardware, limiting overhead to some reasonable
level while doing so.

I suspect over time, other constraints will be found and allowed for in the
split algorithm.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.