Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Multi processors chess question

Author: Vincent Diepeveen

Date: 13:06:53 03/05/01

Go up one level in this thread


On March 05, 2001 at 11:39:51, José de Jesús García Ruvalcaba wrote:

>On March 05, 2001 at 11:18:36, Robert Hyatt wrote:
>
>>On March 05, 2001 at 03:07:09, Pham Minh Tri wrote:
>>
>>>Hi all,
>>>
>>>If a chess program runs on a computer with N processors, how many times is it
>>>faster? N times? (compare with the same computer with 1 processor)
>>>
>>>Just a curious question because I have no computer with multiprocessor nor
>>>supercomputer :-(
>>\
>>
>>There are two answers.
>>
>>1.  If the program was written to use a parallel search, then you get one
>>answer.  I can't speak for everybody, but for the case of Crafty, the formula
>>is roughly this:
>>
>>speedup = 1 + (N-1)*.7
>>
>>where N is the number of cpus you have.
>>
>>IE for my quad, this comes out to roughly 3.1 times faster using 4 processors.
>>
>
>But that is sharing the transposition tables. What sort of speedup can we expect

Right that's sharing transposition table and many other things.

>for message-passing chess programs (in which every processor has its own
>hashtables)?
>José.

I have done some calculations on this for DIEP.

The main problems to make a chessprogram using message passing
are next when compared to speedups in the past that some claimed
to get
  a) you of course want a strong program and compare to the
     commercially sold version of that program for a pretty simple reason:
     if you first slowdown a program in order to get a better parallel
     speedup that's pathetic. example: a program that single cpu
     gets 100k nps , but the parallel version gets 10k nps, and on 4
     cpu's it gets a ply 4 times faster as the parallel version on a
     single cpu.

     So in order to get a 100% speedup we first made the program 10 times
     slower. This kind of nonsense is happening a lot in most supercomputer
     tests.
  b) nowadays because of fast cpu's and big RAM and good branching factors
     of most programs programs search pretty deep. getting a good speedup
     with normal alfabeta is easier as getting a speedup with a program
     that already gets 15 ply...
  c) CPUs got faster as communication speed
     A few years ago you could communicate even the weather to other
     cpus. Especially at the transputer period. However now fast
     processors produce GIGAFLOPS single cpu even. So there is more
     information to share in fact.
  d) when hashtables were invented their importance was not so big. However
     nowadays many programs get completely driven by hashtable. For example
     in diep i don't even keep track of a mainline. I get it out of
     hashtable. So convinced i am (or lazy) that such a variation is not
     producing bigger crap as a line which gets specially annotated in
     search.
  e) latency versus bandwidth. Bandwidth of todays clusters is
     not bad at all. It's very fast. But latency sucks. The time needed
     to ship 1 message and wait till it returns answer is dead dead slow.
     For example if you try this at home at your 100mbit ethernet network
     and you ship using DATAGRAMS (already way faster as TCP/IP which works
     with acknowledgements itself) then you can do this a few thousands
     time a second. Now a giganet network, which is more expensive as
     buying a 8 processor Xeon is, despite its huge tens of thousands
     of dollars prices for the switch+cards is not going to make you
     very happy here either as they're at most 10 times faster as the
     turtle called 100mbit cards.

So in short, if you want to do it at home because of the huge prices for
fast network cards you are limited to a couple of thousands of messages
a second which the WHOLE 100mbit network must share.

So suppose i borrow a few computers from neighbours and put them all on
a cheap 8 port hub which is only $100 here. Each 100mbit card is like
$20 here.

Now that's a cheap network. So that's affordable.

This is going to give speed of communication however in milliseconds.

That's not very helpful!

And whereas on clusters you still might do something with the big
bandwidth, also the bandwidth on a 100mbit network is pathetic as
everything must be divided by 8!

In short you can't share hashtable at all. Even asking on depths near
the root hashtable information from other cpu's is very dangerous as
it means you are going to forfeit in 20 0 games for sure already.

The obvious conclusion is that you need to search without sharing
information. Now that's a pretty bad idea. It requires not only a lot
of work, but the speedup would be real pathetic.

For sure at blitz you cannot play with such a machine. Not even at
rapid levels. As you lose a few seconds anyway each move!

If you get a speedup with this at 3 minutes a move, that would be
already not bad for a lot of work.

Because to match the efficiency of todays engines which nullmove
big trees away you really must work hard!

For sure you're NOT going to beat a dual 1.0Ghz PIII
with a 100mbit network and 8 nodes
say 8 machines from 1.2 Ghz K7

Of course things are different on clusters where latency is
much better and especially bandwidth. But where to get
system time on a cluster?

I couldn't get system time on a say 256 processor cluster!

Vincent


>>2.  If the program doesn't know anything about a parallel search, then it will
>>get no speedup whatsoever no matter how many processors you have.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.