Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty Cluster Feasible in Future?

Author: Bob Durrett

Date: 07:54:52 02/21/04

Go up one level in this thread


On February 21, 2004 at 09:49:03, Robert Hyatt wrote:

>On February 21, 2004 at 09:02:43, Bob Durrett wrote:
>
>>On February 21, 2004 at 01:24:29, Robert Hyatt wrote:
>>
>>>On February 20, 2004 at 21:25:33, Bob Durrett wrote:
>>>
>>>>On February 20, 2004 at 21:22:12, Robert Hyatt wrote:
>>>>
>>>>>On February 20, 2004 at 13:29:51, Dann Corbit wrote:
>>>>>
>>>>>>On February 20, 2004 at 10:43:11, Robert Hyatt wrote:
>>>>>>
>>>>>>>On February 19, 2004 at 15:09:25, Dann Corbit wrote:
>>>>>>>
>>>>>>>>On February 19, 2004 at 14:55:40, Robert Hyatt wrote:
>>>>>>>>
>>>>>>>>>On February 19, 2004 at 14:25:07, Dann Corbit wrote:
>>>>>>>>>
>>>>>>>>>>On February 19, 2004 at 14:14:11, Robert Hyatt wrote:
>>>>>>>>>>[snip]
>>>>>>>>>>>Clusters can work.  1000 nodes is a totally different problem from using 1-64,
>>>>>>>>>>>but nothing I see says it is an impossible task.  Impossible to "some" perhaps
>>>>>>>>>>>so.  Lots of things are impossible to "some".  Fortunately, for the rest of us,
>>>>>>>>>>>what is "impossible" is something that just takes a bit longer to do... :)
>>>>>>>>>>
>>>>>>>>>>A cluster notion (e.g. 64000 machines maybe using the network):
>>>>>>>>>>
>>>>>>>>>>Divide the machines into clumps that work on forward nodes, and multiple
>>>>>>>>>>possible best moves.
>>>>>>>>>
>>>>>>>>>I already do this on large SMP platforms.  IE crafty has a "group" concept, so
>>>>>>>>>that with (say) 64 nodes, you don't get 64 nodes working on one split point.
>>>>>>>>>That is bad for multiple reasons.  Usually there are not 64 moves to search so
>>>>>>>>>the split overhead is just pure overhead.  Also, on NUMA boxes, you don't want
>>>>>>>>>64 processors banging on one "split block" even though they don't do it that
>>>>>>>>>much unless the split is too deep in the tree (this is also controllable in
>>>>>>>>>current crafty).
>>>>>>>>>
>>>>>>>>>I suspect the same idea will be the right way for clusters.  Divide it into
>>>>>>>>>groups, sub-groups, and so forth.  The "groups" have a "head" and the initial
>>>>>>>>>split is between the "group heads".  Then each group head can further split its
>>>>>>>>>tree farther down the path, with sub-group members.  Of course each of these
>>>>>>>>>might be a "sub-group-head" that could later further split the tree among
>>>>>>>>>sub-sub-group members.
>>>>>>>>>
>>>>>>>>>I think some such hierarchy is necessary for truly large numbers of processors,
>>>>>>>>>and the potential for search overhead is going to be significant.
>>>>>>>>
>>>>>>>>Have one CPU that looks at the whole tree and makes recommendations (a sort of
>>>>>>>>loadmaster).  In that way, the CPU pool does not get divided evenly.  So (in the
>>>>>>>>first move example) moves like 1. e4, 1. d4, 1. Nc3 get 20,000 CPUs each and
>>>>>>>>things like 1. a3 might only get 10 CPUs.
>>>>>>>>
>>>>>>>>This concept can also carry forward as far as you like.
>>>>>>>
>>>>>>>I don't think that will work.  The reason is that there _are_ positions where a
>>>>>>>move like a3 is the key move.
>>>>>>
>>>>>>But there are still ten CPUs working on that one.  About like the system that
>>>>>>recently pounded the opposition into oblivion on CCT-6
>>>>>
>>>>>Yes, but if there are thousands on _other_ moves, and you run out of time before
>>>>>the "10" have an answer, you never know the move is good.  This is a problem I
>>>>>have worked on in Cray Blitz and Crafty for many years.  My current approach has
>>>>>some specific stuff for this case, in fact...
>>>>>
>>>>>
>>>>>>
>>>>>>> I don't think that some sort of static decision
>>>>>>>can be made as to which move deserves the most (or least) processors.  It seems
>>>>>>>to make more sense for this decision to be dynamic and based on what is actually
>>>>>>>happening within the tree, not on some static characteristic of the individual
>>>>>>>moves, which might be misleading.
>>>>>>
>>>>>>The way that I suggest is how humans play chess.  We look at the bad lines, but
>>>>>>we focus much more on the stuff that looks good.
>>>>>
>>>>>Aha.  But your task now is to write the function that makes this statement work
>>>>>correctly:
>>>>>
>>>>>    if ( FeelsGood(move) ) ...
>>>>>
>>>>>:)
>>>>>
>>>>>It isn't so easy...  I've been trying to do that for years, GMs have told me
>>>>>many times that is the "key".  :)
>>>>>
>>>>>>
>>>>>>>My main goal was to avoid having too many processors work together, which causes
>>>>>>>unnecessary overhead.  IE on the Cray C90/T90, with 16/32 processors, we had to
>>>>>>>do the same sort of thing, otherwise in endgames, there are plenty of positions
>>>>>>>with less than 32 or even less than 16 moves, and we didn't want the split
>>>>>>>overhead there to go out of control..  However, on the cray it was a one-level
>>>>>>>hierarchy, where it might be much more complex on a large cluster-type or
>>>>>>>NUMA-type box that could have way more than 32 processors.
>>>>>>>
>>>>>>>My feeling is that for any architecture, there is an "optimal" number of
>>>>>>>processors that divide up a chunk of work and search the same tree in parallel.
>>>>>>>You apply this "number" recursively in a divide-and-conquer approach to increase
>>>>>>>the total number of processors "in the loop" without ever having more than some
>>>>>>>finite number working on a particular node together.
>>>>>>>
>>>>>>>I also have a depth limit on splits in Crafty, to control the split overhead,
>>>>>>>which is not "free".  Every time I change hardware, I run tests to see if the
>>>>>>>number should be changed, but since I left the quad pentium-pro 200 and changed
>>>>>>>it then, it has not been changed since.  But again, it is an intent to adjust
>>>>>>>the algorithm to the specific hardware, limiting overhead to some reasonable
>>>>>>>level while doing so.
>>>>>>>
>>>>>>>I suspect over time, other constraints will be found and allowed for in the
>>>>>>>split algorithm.
>>>>>>
>>>>>>It is a very interesting area for study.  The ChessBrain project has some
>>>>>>interesting stuff going on in that regard, and they use the internet as the
>>>>>>medium for the CPU farm.  The fact that they are able to succeed against much
>>>>>>stronger programs than the leaf processors shows that even a distributed system
>>>>>>with very slow links can be feasible.
>>>>>
>>>>>
>>>>>Never said it couldn't.  That was Vincent.  I've been a "cluster believer" for
>>>>>many years...
>>>>>
>>>>>:)
>>>>>
>>>>>I will have a cluster crafty one day also...  once we get our opteron cluster, I
>>>>>plan on a serious look...  Right now we are targeting a 128 node cluster with
>>>>>each node being a dual opteron box...
>>>>>
>>>>>In theory that could max out at 4M nps per node, or deep-blue-like speeds of
>>>>>512M nodes per second.  Of course it won't be as efficient as my current SMP
>>>>>search, but it will be interesting to see how close it can get...
>>>>
>>>>Like many words in the English language, "cluster" could mean many things, even
>>>>to an electrical engineer.
>>>>
>>>>I would like to know what "Cluster" means in the context of this thread, so I
>>>>can understand this thread better.
>>>>
>>>>Help!!!!
>>>>
>>>>Bob D.
>>>
>>>
>>>Buy a dual-processor opteron box.  Then buy a second one and connect them with
>>>something decent, say at least gigabit ethernet.  Now you have a two-node
>>>cluster.
>>>
>>>Clusters imply no shared memory, just some sort of message-passing facility that
>>>lets the nodes communicate in some reasonably efficient manner.  Nothing is as
>>>good as a real shared memory machine (SMP).  Next comes NUMA machines that are
>>>similar to SMP but with memory latencies that vary depending on how far it is
>>>from one processor to the memory it want's to access.  Then come clusters that
>>>are based on various connection facilities, from plain 100mbit ethernet, to
>>>gigabit ethernet, to more exotic (and way faster) interconnections like our old
>>>cLAN cluster that has very low latency and very high bandwidth, but at way
>>>expensive cost...
>>
>>OK, so my concept of a cluster was not so far off after all.  I had been
>>thinking that it didn't matter how the different cluster elements were arranged
>>with respect to each other and the above seems to agree with that.
>>
>>By way of contrast, neural networks seem different.  What would be the minimum
>>requirements to be met for something to be called a neural network, and are
>>neural networks considered to be a special type of the clusters you are
>>discussing in this thread?
>>
>>Bob D.
>
>
>Neural networks is a specific approach to solving a problem by using a
>human-like (in some twisted sense) approximation to basic brain building blocks.
> It really doesn't matter how the "neurons" communicate.  There, the issue is
>only speed.  IE the various layers of "neurons" communicate in some fashion,
>where a SMP box provides the fastest communication, a cluster the slowest.  But
>either way it would still be a neural network.  Just as either would allow us to
>build a parallel chess search engine, but one would definitely have a
>performance advantage over the other, node for node.

It sounds like there may yet be room for advancement here.

Bob D.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.