Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Crafty Cluster Feasible in Future?

Author: Bob Durrett
Date: 14:27:33 02/21/04
On February 21, 2004 at 14:31:14, Robert Hyatt wrote:

>On February 21, 2004 at 10:54:52, Bob Durrett wrote:
>
>>On February 21, 2004 at 09:49:03, Robert Hyatt wrote:
>>
>>>On February 21, 2004 at 09:02:43, Bob Durrett wrote:
>>>
>>>>On February 21, 2004 at 01:24:29, Robert Hyatt wrote:
>>>>
>>>>>On February 20, 2004 at 21:25:33, Bob Durrett wrote:
>>>>>
>>>>>>On February 20, 2004 at 21:22:12, Robert Hyatt wrote:
>>>>>>
>>>>>>>On February 20, 2004 at 13:29:51, Dann Corbit wrote:
>>>>>>>
>>>>>>>>On February 20, 2004 at 10:43:11, Robert Hyatt wrote:
>>>>>>>>
>>>>>>>>>On February 19, 2004 at 15:09:25, Dann Corbit wrote:
>>>>>>>>>
>>>>>>>>>>On February 19, 2004 at 14:55:40, Robert Hyatt wrote:
>>>>>>>>>>
>>>>>>>>>>>On February 19, 2004 at 14:25:07, Dann Corbit wrote:
>>>>>>>>>>>
>>>>>>>>>>>>On February 19, 2004 at 14:14:11, Robert Hyatt wrote:
>>>>>>>>>>>>[snip]
>>>>>>>>>>>>>Clusters can work.  1000 nodes is a totally different problem from using 1-64,
>>>>>>>>>>>>>but nothing I see says it is an impossible task.  Impossible to "some" perhaps
>>>>>>>>>>>>>so.  Lots of things are impossible to "some".  Fortunately, for the rest of us,
>>>>>>>>>>>>>what is "impossible" is something that just takes a bit longer to do... :)
>>>>>>>>>>>>
>>>>>>>>>>>>A cluster notion (e.g. 64000 machines maybe using the network):
>>>>>>>>>>>>
>>>>>>>>>>>>Divide the machines into clumps that work on forward nodes, and multiple
>>>>>>>>>>>>possible best moves.
>>>>>>>>>>>
>>>>>>>>>>>I already do this on large SMP platforms.  IE crafty has a "group" concept, so
>>>>>>>>>>>that with (say) 64 nodes, you don't get 64 nodes working on one split point.
>>>>>>>>>>>That is bad for multiple reasons.  Usually there are not 64 moves to search so
>>>>>>>>>>>the split overhead is just pure overhead.  Also, on NUMA boxes, you don't want
>>>>>>>>>>>64 processors banging on one "split block" even though they don't do it that
>>>>>>>>>>>much unless the split is too deep in the tree (this is also controllable in
>>>>>>>>>>>current crafty).
>>>>>>>>>>>
>>>>>>>>>>>I suspect the same idea will be the right way for clusters.  Divide it into
>>>>>>>>>>>groups, sub-groups, and so forth.  The "groups" have a "head" and the initial
>>>>>>>>>>>split is between the "group heads".  Then each group head can further split its
>>>>>>>>>>>tree farther down the path, with sub-group members.  Of course each of these
>>>>>>>>>>>might be a "sub-group-head" that could later further split the tree among
>>>>>>>>>>>sub-sub-group members.
>>>>>>>>>>>
>>>>>>>>>>>I think some such hierarchy is necessary for truly large numbers of processors,
>>>>>>>>>>>and the potential for search overhead is going to be significant.
>>>>>>>>>>
>>>>>>>>>>Have one CPU that looks at the whole tree and makes recommendations (a sort of
>>>>>>>>>>loadmaster).  In that way, the CPU pool does not get divided evenly.  So (in the
>>>>>>>>>>first move example) moves like 1. e4, 1. d4, 1. Nc3 get 20,000 CPUs each and
>>>>>>>>>>things like 1. a3 might only get 10 CPUs.
>>>>>>>>>>
>>>>>>>>>>This concept can also carry forward as far as you like.
>>>>>>>>>
>>>>>>>>>I don't think that will work.  The reason is that there _are_ positions where a
>>>>>>>>>move like a3 is the key move.
>>>>>>>>
>>>>>>>>But there are still ten CPUs working on that one.  About like the system that
>>>>>>>>recently pounded the opposition into oblivion on CCT-6
>>>>>>>
>>>>>>>Yes, but if there are thousands on _other_ moves, and you run out of time before
>>>>>>>the "10" have an answer, you never know the move is good.  This is a problem I
>>>>>>>have worked on in Cray Blitz and Crafty for many years.  My current approach has
>>>>>>>some specific stuff for this case, in fact...
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> I don't think that some sort of static decision
>>>>>>>>>can be made as to which move deserves the most (or least) processors.  It seems
>>>>>>>>>to make more sense for this decision to be dynamic and based on what is actually
>>>>>>>>>happening within the tree, not on some static characteristic of the individual
>>>>>>>>>moves, which might be misleading.
>>>>>>>>
>>>>>>>>The way that I suggest is how humans play chess.  We look at the bad lines, but
>>>>>>>>we focus much more on the stuff that looks good.
>>>>>>>
>>>>>>>Aha.  But your task now is to write the function that makes this statement work
>>>>>>>correctly:
>>>>>>>
>>>>>>>    if ( FeelsGood(move) ) ...
>>>>>>>
>>>>>>>:)
>>>>>>>
>>>>>>>It isn't so easy...  I've been trying to do that for years, GMs have told me
>>>>>>>many times that is the "key".  :)
>>>>>>>
>>>>>>>>
>>>>>>>>>My main goal was to avoid having too many processors work together, which causes
>>>>>>>>>unnecessary overhead.  IE on the Cray C90/T90, with 16/32 processors, we had to
>>>>>>>>>do the same sort of thing, otherwise in endgames, there are plenty of positions
>>>>>>>>>with less than 32 or even less than 16 moves, and we didn't want the split
>>>>>>>>>overhead there to go out of control..  However, on the cray it was a one-level
>>>>>>>>>hierarchy, where it might be much more complex on a large cluster-type or
>>>>>>>>>NUMA-type box that could have way more than 32 processors.
>>>>>>>>>
>>>>>>>>>My feeling is that for any architecture, there is an "optimal" number of
>>>>>>>>>processors that divide up a chunk of work and search the same tree in parallel.
>>>>>>>>>You apply this "number" recursively in a divide-and-conquer approach to increase
>>>>>>>>>the total number of processors "in the loop" without ever having more than some
>>>>>>>>>finite number working on a particular node together.
>>>>>>>>>
>>>>>>>>>I also have a depth limit on splits in Crafty, to control the split overhead,
>>>>>>>>>which is not "free".  Every time I change hardware, I run tests to see if the
>>>>>>>>>number should be changed, but since I left the quad pentium-pro 200 and changed
>>>>>>>>>it then, it has not been changed since.  But again, it is an intent to adjust
>>>>>>>>>the algorithm to the specific hardware, limiting overhead to some reasonable
>>>>>>>>>level while doing so.
>>>>>>>>>
>>>>>>>>>I suspect over time, other constraints will be found and allowed for in the
>>>>>>>>>split algorithm.
>>>>>>>>
>>>>>>>>It is a very interesting area for study.  The ChessBrain project has some
>>>>>>>>interesting stuff going on in that regard, and they use the internet as the
>>>>>>>>medium for the CPU farm.  The fact that they are able to succeed against much
>>>>>>>>stronger programs than the leaf processors shows that even a distributed system
>>>>>>>>with very slow links can be feasible.
>>>>>>>
>>>>>>>
>>>>>>>Never said it couldn't.  That was Vincent.  I've been a "cluster believer" for
>>>>>>>many years...
>>>>>>>
>>>>>>>:)
>>>>>>>
>>>>>>>I will have a cluster crafty one day also...  once we get our opteron cluster, I
>>>>>>>plan on a serious look...  Right now we are targeting a 128 node cluster with
>>>>>>>each node being a dual opteron box...
>>>>>>>
>>>>>>>In theory that could max out at 4M nps per node, or deep-blue-like speeds of
>>>>>>>512M nodes per second.  Of course it won't be as efficient as my current SMP
>>>>>>>search, but it will be interesting to see how close it can get...
>>>>>>
>>>>>>Like many words in the English language, "cluster" could mean many things, even
>>>>>>to an electrical engineer.
>>>>>>
>>>>>>I would like to know what "Cluster" means in the context of this thread, so I
>>>>>>can understand this thread better.
>>>>>>
>>>>>>Help!!!!
>>>>>>
>>>>>>Bob D.
>>>>>
>>>>>
>>>>>Buy a dual-processor opteron box.  Then buy a second one and connect them with
>>>>>something decent, say at least gigabit ethernet.  Now you have a two-node
>>>>>cluster.
>>>>>
>>>>>Clusters imply no shared memory, just some sort of message-passing facility that
>>>>>lets the nodes communicate in some reasonably efficient manner.  Nothing is as
>>>>>good as a real shared memory machine (SMP).  Next comes NUMA machines that are
>>>>>similar to SMP but with memory latencies that vary depending on how far it is
>>>>>from one processor to the memory it want's to access.  Then come clusters that
>>>>>are based on various connection facilities, from plain 100mbit ethernet, to
>>>>>gigabit ethernet, to more exotic (and way faster) interconnections like our old
>>>>>cLAN cluster that has very low latency and very high bandwidth, but at way
>>>>>expensive cost...
>>>>
>>>>OK, so my concept of a cluster was not so far off after all.  I had been
>>>>thinking that it didn't matter how the different cluster elements were arranged
>>>>with respect to each other and the above seems to agree with that.
>>>>
>>>>By way of contrast, neural networks seem different.  What would be the minimum
>>>>requirements to be met for something to be called a neural network, and are
>>>>neural networks considered to be a special type of the clusters you are
>>>>discussing in this thread?
>>>>
>>>>Bob D.
>>>
>>>
>>>Neural networks is a specific approach to solving a problem by using a
>>>human-like (in some twisted sense) approximation to basic brain building blocks.
>>> It really doesn't matter how the "neurons" communicate.  There, the issue is
>>>only speed.  IE the various layers of "neurons" communicate in some fashion,
>>>where a SMP box provides the fastest communication, a cluster the slowest.  But
>>>either way it would still be a neural network.  Just as either would allow us to
>>>build a parallel chess search engine, but one would definitely have a
>>>performance advantage over the other, node for node.
>>
>>It sounds like there may yet be room for advancement here.
>>
>>Bob D.
>
>The problem with neural nets is "training".  IE for some tasks, the training is
>quick and the proficiency is very high after that training.  But chess is a game
>of exceptions to rules as well as a game of rules.  Neural nets don't really
>extract the same "information" from a game (or a position) that we do.  We
>simply tell them "here is a position, here is the score" and we repeat that N
>times.  The larger N, the better (in general) the final product.  I am highly
>suspicious, however, that for chess, N is beyond ridiculous, and the internal
>geometry of the "network" will be huge as well.
>
>I simply think in our chess domain, neural nets won't work, even though they are
>quite good in some domains...

Maybe it's for the future.  Neural nets require, I assume, very small
microprocessors with, perhaps, many microprocessors on the same die.
Multi-layer structures may become practical if future new technology does not
require so much heat dissipation.

When they find new technology to replace nanotechnology with something much
better, then neural nets and similar structures may come into their own, so to
speak.  Unfortunately, we may all be gone long before that time.

As for using neural nets for chess, there may be room for our learning better
knowledge as to how to educate the neural net, much as human brains are
educated.  A case in point is the matter of deciding what knowledge to impart
first.  I think it is endgames.

Bob D.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.