Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty Cluster Feasible in Future?

Author: Bob Durrett

Date: 18:25:33 02/20/04

Go up one level in this thread


On February 20, 2004 at 21:22:12, Robert Hyatt wrote:

>On February 20, 2004 at 13:29:51, Dann Corbit wrote:
>
>>On February 20, 2004 at 10:43:11, Robert Hyatt wrote:
>>
>>>On February 19, 2004 at 15:09:25, Dann Corbit wrote:
>>>
>>>>On February 19, 2004 at 14:55:40, Robert Hyatt wrote:
>>>>
>>>>>On February 19, 2004 at 14:25:07, Dann Corbit wrote:
>>>>>
>>>>>>On February 19, 2004 at 14:14:11, Robert Hyatt wrote:
>>>>>>[snip]
>>>>>>>Clusters can work.  1000 nodes is a totally different problem from using 1-64,
>>>>>>>but nothing I see says it is an impossible task.  Impossible to "some" perhaps
>>>>>>>so.  Lots of things are impossible to "some".  Fortunately, for the rest of us,
>>>>>>>what is "impossible" is something that just takes a bit longer to do... :)
>>>>>>
>>>>>>A cluster notion (e.g. 64000 machines maybe using the network):
>>>>>>
>>>>>>Divide the machines into clumps that work on forward nodes, and multiple
>>>>>>possible best moves.
>>>>>
>>>>>I already do this on large SMP platforms.  IE crafty has a "group" concept, so
>>>>>that with (say) 64 nodes, you don't get 64 nodes working on one split point.
>>>>>That is bad for multiple reasons.  Usually there are not 64 moves to search so
>>>>>the split overhead is just pure overhead.  Also, on NUMA boxes, you don't want
>>>>>64 processors banging on one "split block" even though they don't do it that
>>>>>much unless the split is too deep in the tree (this is also controllable in
>>>>>current crafty).
>>>>>
>>>>>I suspect the same idea will be the right way for clusters.  Divide it into
>>>>>groups, sub-groups, and so forth.  The "groups" have a "head" and the initial
>>>>>split is between the "group heads".  Then each group head can further split its
>>>>>tree farther down the path, with sub-group members.  Of course each of these
>>>>>might be a "sub-group-head" that could later further split the tree among
>>>>>sub-sub-group members.
>>>>>
>>>>>I think some such hierarchy is necessary for truly large numbers of processors,
>>>>>and the potential for search overhead is going to be significant.
>>>>
>>>>Have one CPU that looks at the whole tree and makes recommendations (a sort of
>>>>loadmaster).  In that way, the CPU pool does not get divided evenly.  So (in the
>>>>first move example) moves like 1. e4, 1. d4, 1. Nc3 get 20,000 CPUs each and
>>>>things like 1. a3 might only get 10 CPUs.
>>>>
>>>>This concept can also carry forward as far as you like.
>>>
>>>I don't think that will work.  The reason is that there _are_ positions where a
>>>move like a3 is the key move.
>>
>>But there are still ten CPUs working on that one.  About like the system that
>>recently pounded the opposition into oblivion on CCT-6
>
>Yes, but if there are thousands on _other_ moves, and you run out of time before
>the "10" have an answer, you never know the move is good.  This is a problem I
>have worked on in Cray Blitz and Crafty for many years.  My current approach has
>some specific stuff for this case, in fact...
>
>
>>
>>> I don't think that some sort of static decision
>>>can be made as to which move deserves the most (or least) processors.  It seems
>>>to make more sense for this decision to be dynamic and based on what is actually
>>>happening within the tree, not on some static characteristic of the individual
>>>moves, which might be misleading.
>>
>>The way that I suggest is how humans play chess.  We look at the bad lines, but
>>we focus much more on the stuff that looks good.
>
>Aha.  But your task now is to write the function that makes this statement work
>correctly:
>
>    if ( FeelsGood(move) ) ...
>
>:)
>
>It isn't so easy...  I've been trying to do that for years, GMs have told me
>many times that is the "key".  :)
>
>>
>>>My main goal was to avoid having too many processors work together, which causes
>>>unnecessary overhead.  IE on the Cray C90/T90, with 16/32 processors, we had to
>>>do the same sort of thing, otherwise in endgames, there are plenty of positions
>>>with less than 32 or even less than 16 moves, and we didn't want the split
>>>overhead there to go out of control..  However, on the cray it was a one-level
>>>hierarchy, where it might be much more complex on a large cluster-type or
>>>NUMA-type box that could have way more than 32 processors.
>>>
>>>My feeling is that for any architecture, there is an "optimal" number of
>>>processors that divide up a chunk of work and search the same tree in parallel.
>>>You apply this "number" recursively in a divide-and-conquer approach to increase
>>>the total number of processors "in the loop" without ever having more than some
>>>finite number working on a particular node together.
>>>
>>>I also have a depth limit on splits in Crafty, to control the split overhead,
>>>which is not "free".  Every time I change hardware, I run tests to see if the
>>>number should be changed, but since I left the quad pentium-pro 200 and changed
>>>it then, it has not been changed since.  But again, it is an intent to adjust
>>>the algorithm to the specific hardware, limiting overhead to some reasonable
>>>level while doing so.
>>>
>>>I suspect over time, other constraints will be found and allowed for in the
>>>split algorithm.
>>
>>It is a very interesting area for study.  The ChessBrain project has some
>>interesting stuff going on in that regard, and they use the internet as the
>>medium for the CPU farm.  The fact that they are able to succeed against much
>>stronger programs than the leaf processors shows that even a distributed system
>>with very slow links can be feasible.
>
>
>Never said it couldn't.  That was Vincent.  I've been a "cluster believer" for
>many years...
>
>:)
>
>I will have a cluster crafty one day also...  once we get our opteron cluster, I
>plan on a serious look...  Right now we are targeting a 128 node cluster with
>each node being a dual opteron box...
>
>In theory that could max out at 4M nps per node, or deep-blue-like speeds of
>512M nodes per second.  Of course it won't be as efficient as my current SMP
>search, but it will be interesting to see how close it can get...

Like many words in the English language, "cluster" could mean many things, even
to an electrical engineer.

I would like to know what "Cluster" means in the context of this thread, so I
can understand this thread better.

Help!!!!

Bob D.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.