Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The King's News Clothes (Re: DB vs)

Author: Ernst A. Heinz

Date: 08:11:43 11/28/98

Go up one level in this thread


On November 24, 1998 at 17:17:07, Robert Hyatt wrote:
>
> [...]
>>
>>Bob,
>>
>>AFAIK your 30% overhead is only a good average approximation for lowly parallel
>>searchers on SMPs with *physically* shared hash tables. For massively parallel
>>searchers on machines with *physically* distributed memory I have not yet seen
>>any experimental data that *conclusively* supports such high parallel
>>efficiency. To the contrary, the only frank publications in this respect seem
>>to be the articles by the "StarTech" and "StarSocrates" groups who admit to
>>something like an application speedup of only 50-60 on a CM-5 with 512 CPUs
>>which translates to a parallel efficiency of 10%-15% for their Jamboree search.
>>Most other researchers who reported higher relative speedups for their
>>massively parallel implementations on distributed-memory machines either failed
>>to account for the increases in hash-table sizes or used horribly inefficient
>>sequential implementations as their point of reference.
>>
>>=Ernst=
>
>This isn't really an issue about 'shared hash tables'.

Yes, sorry, this was an obvious typo. :-(

I meant *physically* shared memory which allows for efficient scheduling and
sharing of work between the parallel processors (e.g. your DTS algorithm).

[BTW, physically shared hash tables also improve the efficiency of these shared
implementations -- primarily because of better move ordering.]

>Hash tables don't
>give a factor of 2 in the middlegame based on results I've gotten.  This is
>about "process granularity".  The *Socrates machine uses a message-passing
>protocol that is inherently slow.  I've run on such machines (IE the CM-5
>for one, the T3D/E for another) and this causes serious problems.  The
>big + for shared memory is instant communication. so that threads can
>share information without regard to "cost".

Right (see my comments above).

>The DB machine doesn't suffer from the huge CM-5 type cost, they only use
>16 (or 32) cpus, and each CPU talks to the chess processors at bus speeds, not
>at
>2microseconds/message or whatever as in the CM and other architectures.  In
>fact, the DB (last edition) chess processors didn't transposition tables, only
>the search done on the SP did.

I agree to your argumentation as for the communication between the chess
processors and the host CPUs of the SP. I remember somebody of the DB team
mention that the limiting factors of this communication link were *not* the
chess processors but in fact the host CPUs.

So far, so good. Yet, the SP itself looks like an extremely poor machine for
parallel chess because it is essentially a cluster of workstations coupled by
a special interconnect that features nice throughput but horrible latency AFAIK.
With respect to latency the interconnect of the SP is *much worse* than those
of the CM-5 and the T3D/T3E. As you and I already said it is extremely hard to
squeeze acceptable parallel alpha-beta search performance out of such
high-latency distributed-memory machines -- even if you "only" use 32 CPUs.

==> As I already proclaimed several times before, not the chess processors but
    rather the 32-node SP host machine looks like the obvious bottleneck of
    "Deep(er) Blue".

I wonder if and how the "Deep(er) Blue" team succeeded in achieving more than
30% parallel search efficiency on the SP.

=Ernst=



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.