Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Would This Configuration Work?

Author: Robert Henry Durrett

Date: 17:09:28 05/30/02

Go up one level in this thread


On May 30, 2002 at 09:09:19, David Dory wrote:

>On May 29, 2002 at 23:19:52, Robert Henry Durrett wrote:
>
>>On May 29, 2002 at 21:57:35, David Dory wrote:
>>
>>>On May 29, 2002 at 20:44:07, Robert Henry Durrett wrote:
>>>
>>>>Would this Configuration Work [At least theoretically]?
>>>>
>>>>Perhaps all in a single box, conveniently called a “COMPUTER”:
>>>>
>>>>Have eight motherboards, each with one processor.  Motherboards and processors
>>>>would be the fastest available.  Each motherboard with it’s own RAM, also
>>>>fastest available.  Everything optimized for use as chess engines.  Chess engine
>>>>software, possibly all identical, loaded into each MB/RAM/Processor.  I.e. each
>>>>board operates as a chess computer, independently from the other seven “fast”
>>>>boards.  Essentially, there would be eight chess computers all performing
>>>>independently, one on each board.
>>>>
>>>>Have two more motherboards, not necessarily fast at all, with relatively slow
>>>>processors.  RAM might be not so special either, although maybe should be if
>>>>used for transferring data to other processors via read/write to this RAM.
>>>>
>>>>The chess move gets entered into the first “slow” processor.  It computes all
>>>>legal two-ply [or more?] move sequences.  That would not take much time, so fast
>>>>processors not necessary.  Then the set of possible two-ply sequences is
>>>>partitioned into eight subsets, all roughly equal in size.
>>>>
>>>>Then the first subset is sent to the first fast board, the second subset sent to
>>>>the second fast board, etc., with the eighth subset sent to the eighth fast
>>>>board.
>>>>
>>>>Each fast board [i.e. chess computer] then finds the best move sequence of those
>>>>sent to it.  It does not consider any other cases.
>>>>
>>>>Then each of the eight fast boards sends it’s “best move,” along with any
>>>>supporting data, to the second “slow” processor.
>>>>
>>>>Finally, the “slow” processor selects the best of the eight available moves
>>>>[using the "supporting data" as appropriate] and outputs the COMPUTER’s move.
>>>>If two best are precisely equal, may have to pick one by “flip of coin.”
>>>>
>>>>No one needs to know how many boards are in the box.  [May as well be sneaky.]
>>>>
>>>>Incidentally, the box must have the necessary supporting hardware, such as power
>>>>supply(ies) etc.
>>>>
>>>>If this would work, at least theoretically, could it be made to work in time for
>>>>the DF/Kramnik match in October?  NASA could send a man to the moon and they met
>>>>their deadline, and this seems a much simpler task.  Would a “can do attitude”
>>>>[and $$$] get this job done in time?
>>>>
>>>>Bob D.
>>>
>>>Sorry Bob,
>>>
>>>You don't get much benefit out of this type of set-up. (This is very similar to
>>>a "cluster", I believe). The best thing about it is that you could have each CPU
>>>& board, "guess" a different answering move by the opponent, and begin working
>>>on the best reply on the opponent's time. IMHO
>>>
>>>Because of the nature of alpha-beta, the bandwidth get's plugged.
>>>
>>>This is what I've always been told. REALLY would like to see it done though, and
>>>thoroughly tested with excellent components. (No slow CPU's)
>>>
>>>Sometimes you THINK it just HAS to be (good/bad/indifferent), but you don't know
>>>for stink until it's really tested. Even then, you only know with the EXACT
>>>set-up you have at that moment of testing.
>>>
>>>Everything else is only fine speculation.
>>>
>>>David
>>
>>Thanks for responding.  I was beginning to think that I was "persona non grata"
>>on this bulletin board.
>>
>>I deliberately avoided getting into any details, at all, to keep the
>>presentation as conceptually simple as possible.  Perhaps that's why you saw it
>>as being in the nature of "speculation."  I was an engineer [Electrical,
>>Aerospace, & Systems] for thirty years and have a considerable amount of design
>>[& testing] experience.  I really do appreciate that initial conceptualization
>>is merely the very first step and that there would be a lot of detail design and
>>testing remaining.  I'm not sure it would take so long, though, if the right
>>people were on the job.
>>
>>About the issue of "similarity to clusters": The suggested partitioning concept
>>may distinguish this concept from the clustering concepts you are referring to.
>>I don't know, since there may be more than one "clustering" concept in the
>>literature and I don't know which one(s) you are referring to.
>>
>>About the issue of "working on the opponent's time":  Once a move is made by the
>>COMPUTER, then the COMPUTER could follow the identical procedure it followed
>>after receipt of the opponent's move.  It is like "Infinite Mode" in Fritz.
>>Once a move is made, Fritz in Infinite Mode doesn't care who's move it is.  It
>>doesn't change anything.  Fritz then goes about finding the next move, even
>>though the move is for the opposite side.
>>
>>About the issue of "no slow computers."  Yes, I included them merely to simplify
>>the presentation of the concept.  The tasks I showed for the "slow computers"
>>would be done by one or more of the "fast" computers.
>>
>>About the issue of "alpha/beta and bandwidth":  I really don't see how this
>>concern is applicable at all in this configuration.  [Help!]
>>
>>Please rethink the concept of partitioning of all possible two-move sequences.
>>This is another case where I deliberately avoided getting into detail.  The
>>partitioning should be done in a "smart" manner to minimize the likelihood of
>>one of the "fast" computers wasting time on moves which were no good, while
>>others were working on the only good moves.  I see a performance improvement by
>>using this "smart partitioning."  The basic concept is to arrange matters so
>>that none of the eight computers would be duplicating the work of any of the
>>others.
>>
>>Similarly, the idea that each of the eight computers would forward only one move
>>[with supporting information] was a simplification made to make the concept
>>presentation simple.  A performance improvement might be expected if each of the
>>eight computers were to forward information on several moves.
>>
>>There are MANY other essential details not presented.  I was looking first for
>>an input as to whether or not the basic concept was essentially sound.
>>
>>Calculation of the maximum theoretical gain from this concept is something of
>>interest.  I was hoping for close to an 8X improvement in effective computation
>>time over a single "fast" computer.  I have deliberately ignored pruning effects
>>because I considered them to be "second order" and primarily at greater ply
>>depths [i.e. **inside** the "fast computers."] I would not have expected pruning
>>within the first two ply.
>>
>>Bob D.
>
_ _ _ _ _ _ _ _ _ _ _ _ _ _
David:

You have some very interesting ideas in your post.  But before getting to them,
I want to make sure we are "singing on the same sheet of music," so to speak.
My interest in this topic developed while thinking about the upcoming DF/Kramnik
match in October.  I tried to imagine a configuration which would have a
reasonable chance of being built in time for that match.  Since no one has
posted anything here about the computer that is to be used for that match, I
found myself speculating as to what could possibly be planned.  The first
thought that came to me was:  "Whatever it is, it's going to have to be
something they can build and deliver by August, if Kramnik is to have an
opportunity to play with it for a few weeks before the match."  This led to
recall of the often quoted principle "Keep it Simple Stupid [KISS]."  To me,
that meant that anything that would require extensive design, development, and
testing would be absolutely out of the question.  This prompted the question
"What could be put together in just a few months?"  Although non-optimal, the
configuration I asked about seemed to me to be one possible choice for that
match.

The first thought was to have separate chess computers operating almost
independently.  This would mean almost no communication between the individual
computers during most of the time between the opponent's last move and the
computer's next move.  Similarly, for "thinking on the opponent's time," there
would be "almost no communication between the individual computers during most
of the time between the computer's last move and the opponent's next move."

This obviously would imply a performance hit.  But to compensate for that
inefficiency, it would be possible to use VERY fast processors.  The added speed
of the eight individual computers was imagined to more than compensate for the
inefficiency, in view of the schedule constraint.  Currently available
8-processor computers are not very fast.  I doubt that any new processors would
be built just for this match.  The cost would be prohibitive.
_ _ _ _ _ _ _ _ _ _ _ _ _ _

>About a month or so back, a poster gave out a url for a "cluster" which I
>loosely define as an arrangement of CPU's with their own memory space - ie. no
>shared memory.
>
>Just for fun, think of a single processor (regular) computer, and into each of
>the PCI slots (could be any kind of slots, however) on the motherboard, we'd
>have a single card computer (CPU, memory, etc.) which would communicate directly
>over the PCI bus with the master CPU.
>
>It's no problem having the CPU's not repeating each other's work in the search,
>but the cost of that is a tremendous amount of communication with the master
>CPU.

Whenever I see words like "tremendous amount," I immediately ask "How much?"
Before your total concept is rejected, it seems necessary to quantify this and
then, and only then, reach any conclusions as to whether or not this is a "show
stopper."

>By the time you had filled all your PCI slots with these single card
>computers, your whole system I BELIEVE, would bog down horribly from the
>communications bottleneck of the PCI bus. I HAVE NO FACTS REGARDING THIS!!
>of actual tests done.

Well, your guess is as good as mine.  But you should still want to
quantitatively evaluate this before rejecting your idea.

>
>>About the issue of "alpha/beta and bandwidth":  I really don't see how this
>>concern is applicable at all in this configuration.  [Help!]
>
>The whole problem with your "partitioning" of the work, and the hoped for
>speed-up, etc., seems to be based on a very intelligent search of the game >
tree.

Generally, I agree that a "smart" partitioning strategy would entail some
up-front analysis in preparation for assignment of the different possible
two-ply sequences to the eight "computers" or processors.

It seems that there are many possible ideas for "smart" partitioning.  One idea
might be as follows:  Based on a very quick preliminary assessment [performed
for a dozen seconds or so after the last move was made] a list of all possible
two-ply move sequences, listed in priority order, would be produced.  The top
priority move could then be assigned to the first fast processor, the next
highest to the next processor, etc., cycling back to the first processor for the
ninth highest priority move, and continuing this process until all possible
two-ply move sequences had been allocated.  Then the eight processors could
begin their analyses in earnest.

>Such a search (because remember, we don't KNOW in advance, what moves are good
>ones, otherwise we'd only choose those, hell with the search :-) ), needs a ton
>of communication to divide up the work amongst all the processors as the game
>tree takes "shape" during the search.

About communication:  In your configuration, this communication would be at
about 400 MB/sec at the most.  It is not clear that "a ton" of communication is
needed, however.  It seems to depend on whether or not you intend communication
between the processors as they are working.  If not, maybe not so much
communication is needed.

>
>I don't believe the bandwidth (maximum capacity) of the bus would be anywhere
>near enough to allow the search to work in this manner, with all the
>communication needed, nearly as fast as you hoped.

In the configuration which I had imagined, each processor would be in it's own
box, and each box would be rack mounted in a standard 19" rack.  The individual
computers would be connected with a network.  The question this immediately
raises is:  How fast could such a network be?

In the Winn L. Rosch Hardware Bible, 5th Edition, 1999, Rosch talks about
something like this in his Chapter 20, "Peripheral Ports."  One thing that
caught my eye was his Table 20.1, "A Comparison of Serial Interfaces," in which
he indicates that IEEE 1394 "FireWire" technology gives a data rate of 800 mbps
with 3,200 mbps proposed.  This FireWire is mainly discussed in terms of
communication between a motherboard and a hard drive memory [SCSI-3] but it
seems to me that also indirectly implied is that communication between separate
computers would not be limited by the peripheral port [serial port] involved if
it were IEEE 1394 or something similar.  The speed of the internals of the
computers would probably present the practical limitation.  Incidentally, I have
not yet looked at the IEEE 1394 Standard, but it should be readily available on
the Internet.

>
>Just my guess here, but I'd say maybe only an extra 1 - 2 ply in the search,
>even with all that hardware.

It was more useful to me to think about the speed increase.  If a single
computer were to take eight minutes to find a certain hard-to-find move, then
maybe the "divide and conquer" strategy would find that same move in a minute or
two.

Maybe a better measure would be rating points increase over just using one
processor.  Although inefficient, when compared to the case where the eight
processors were in continual intelligent communication at all times, there still
should be a rating points increase.  It might not take much of an increase in
rating points to make the difference between Kramnik winning [as he would most
likely do against a single processor Fritz] and the 8-processor Fritz chess
computer winning.

>
>In implementing such a beast, it would be desireable to have only the most
>minimal info returned from the slave CPU's to the master CPU.
>
>The performance at best would be just a tad better than a single fast CPU
>set-up, but it would cost as much or more than some quad's (which really do run
>the parallel chess programs quite fast).

I would prefer to quantify this before characterizing it as "just a tad."


>
>I'm sorry I can't give you any FACTS in this matter. In all the years I've been
>following computer chess, I've never seen a beastie configured like this,
>actually optimized and tested with a chess program.

Guess what.  I can't produce any FACTS either.  We are just thinking out loud.
That should be OK.


>
>Perhaps someone else can help out?
>
>David



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.