Author: Vincent Diepeveen
Date: 13:06:53 03/05/01
Go up one level in this thread
On March 05, 2001 at 11:39:51, José de Jesús García Ruvalcaba wrote:
>On March 05, 2001 at 11:18:36, Robert Hyatt wrote:
>
>>On March 05, 2001 at 03:07:09, Pham Minh Tri wrote:
>>
>>>Hi all,
>>>
>>>If a chess program runs on a computer with N processors, how many times is it
>>>faster? N times? (compare with the same computer with 1 processor)
>>>
>>>Just a curious question because I have no computer with multiprocessor nor
>>>supercomputer :-(
>>\
>>
>>There are two answers.
>>
>>1. If the program was written to use a parallel search, then you get one
>>answer. I can't speak for everybody, but for the case of Crafty, the formula
>>is roughly this:
>>
>>speedup = 1 + (N-1)*.7
>>
>>where N is the number of cpus you have.
>>
>>IE for my quad, this comes out to roughly 3.1 times faster using 4 processors.
>>
>
>But that is sharing the transposition tables. What sort of speedup can we expect
Right that's sharing transposition table and many other things.
>for message-passing chess programs (in which every processor has its own
>hashtables)?
>José.
I have done some calculations on this for DIEP.
The main problems to make a chessprogram using message passing
are next when compared to speedups in the past that some claimed
to get
a) you of course want a strong program and compare to the
commercially sold version of that program for a pretty simple reason:
if you first slowdown a program in order to get a better parallel
speedup that's pathetic. example: a program that single cpu
gets 100k nps , but the parallel version gets 10k nps, and on 4
cpu's it gets a ply 4 times faster as the parallel version on a
single cpu.
So in order to get a 100% speedup we first made the program 10 times
slower. This kind of nonsense is happening a lot in most supercomputer
tests.
b) nowadays because of fast cpu's and big RAM and good branching factors
of most programs programs search pretty deep. getting a good speedup
with normal alfabeta is easier as getting a speedup with a program
that already gets 15 ply...
c) CPUs got faster as communication speed
A few years ago you could communicate even the weather to other
cpus. Especially at the transputer period. However now fast
processors produce GIGAFLOPS single cpu even. So there is more
information to share in fact.
d) when hashtables were invented their importance was not so big. However
nowadays many programs get completely driven by hashtable. For example
in diep i don't even keep track of a mainline. I get it out of
hashtable. So convinced i am (or lazy) that such a variation is not
producing bigger crap as a line which gets specially annotated in
search.
e) latency versus bandwidth. Bandwidth of todays clusters is
not bad at all. It's very fast. But latency sucks. The time needed
to ship 1 message and wait till it returns answer is dead dead slow.
For example if you try this at home at your 100mbit ethernet network
and you ship using DATAGRAMS (already way faster as TCP/IP which works
with acknowledgements itself) then you can do this a few thousands
time a second. Now a giganet network, which is more expensive as
buying a 8 processor Xeon is, despite its huge tens of thousands
of dollars prices for the switch+cards is not going to make you
very happy here either as they're at most 10 times faster as the
turtle called 100mbit cards.
So in short, if you want to do it at home because of the huge prices for
fast network cards you are limited to a couple of thousands of messages
a second which the WHOLE 100mbit network must share.
So suppose i borrow a few computers from neighbours and put them all on
a cheap 8 port hub which is only $100 here. Each 100mbit card is like
$20 here.
Now that's a cheap network. So that's affordable.
This is going to give speed of communication however in milliseconds.
That's not very helpful!
And whereas on clusters you still might do something with the big
bandwidth, also the bandwidth on a 100mbit network is pathetic as
everything must be divided by 8!
In short you can't share hashtable at all. Even asking on depths near
the root hashtable information from other cpu's is very dangerous as
it means you are going to forfeit in 20 0 games for sure already.
The obvious conclusion is that you need to search without sharing
information. Now that's a pretty bad idea. It requires not only a lot
of work, but the speedup would be real pathetic.
For sure at blitz you cannot play with such a machine. Not even at
rapid levels. As you lose a few seconds anyway each move!
If you get a speedup with this at 3 minutes a move, that would be
already not bad for a lot of work.
Because to match the efficiency of todays engines which nullmove
big trees away you really must work hard!
For sure you're NOT going to beat a dual 1.0Ghz PIII
with a 100mbit network and 8 nodes
say 8 machines from 1.2 Ghz K7
Of course things are different on clusters where latency is
much better and especially bandwidth. But where to get
system time on a cluster?
I couldn't get system time on a say 256 processor cluster!
Vincent
>>2. If the program doesn't know anything about a parallel search, then it will
>>get no speedup whatsoever no matter how many processors you have.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.