Computer Chess Club Archives




Subject: Re: Parallel algorithms in chess programming

Author: Vincent Diepeveen

Date: 15:45:09 04/16/01

Go up one level in this thread

On April 16, 2001 at 18:15:52, Dieter Buerssner wrote:

>In a different discussion, Vincent wrote the following:
>>It is not difficult to implement the form of parallellism as used by
>>Rudolf. Invented by a frenchman who couldn't spell a word english and
>>who wrote an impossible article for JICCA (did anyone proofread it at
>>the time as i'm pretty sure they didn't get his parallel idea?).
>>At the time when i read the article i was pathetically laughing about it
>>actually as i also didn't get the idea of the frenchman. But it appears
>>everyone who can make a chessprogram work under win2000 can also get
>>within an afternoon his program parallel to work. Then some debugging
>>and a day later it works cool.
>I'd be very interested in this algorithm, that can be implemented at an
>afternoon :-)
>Could you point elaborate on this.

Answer is: YES...

Ok i'll give a small practical description here. For source code
to implement some features or in depth questions email me.

There are several forms of parallellism and from the
shared memory parallellisms a very important difference is:
  a) dividing the search over the different processors
  b) completely independant searches which get speeded up by
     a side effect

Nearly all papers are about a. Of course it took a frenchman
to find a cheapo algorithm for b.

God knows who proofread that article of the frenchman, probably
the proofreader was only Jaap v/d Herik and he probably only corrected
spelling of english as most likely the frenchman couldn't spell english
at all.

Anyway, Rudolf Huber managed to read it luckily.

The idea is pretty straightforward. Just start your program which we
will call P1 from process 1.

It is completely the same as normal. No hard changes.

Secondly we start a second program P2. This is nearly the exact same
program as P2 except it can't do i/o. This program is only analysing
for example.

To not let P2 always analyze the root position we give moves made
by P1 to P2. Understand me well we talk about moves made in the GAME,
not in the search.

So that's pretty little communication from P1 to P2. Probably smart
is also to give it expected moves by P1 and newgame commands.

But that's about all you need to give from P1 to P2.

Now where is the speedup?

Well in the description as above so far there is no speedup at all.
The speedup comes from a small side effect. Both P1 and P2 share the

So sometimes P1 will be speeded up by transpositiontable cutoffs which
P2 has neatly stored for it in the hashtable.

Even the most deterministic program in the world will be speeded up
bigtime by this, as after a few microseconds already both processes
will not search and evaluate the same positions at the exact same time.

Expected speedup of this algorithm should be around 1.6

What do you need to modify to your engine to get it to work?

  a) let p1 start P2 (initially you might want to use
     _spawnl(), later when it works ok CreateProcess() )
     Let P2 follow all things P1 is doing and also
     not let P2 ask for keyboard input.
     Even not shipping predicted moves is more than ok. just
     filling hashtable is all what P2 needs to do.
  b) create a shared hashtable and lock it before you fill it

to talk more about locking hashtable and shipping moves
to the other process: just like sharing
hashtable you can share another piece of memory to do all this.

Very important thing for shared memory is to know how volatile
variables work.

if i have a global variable that is called:
  int a;

And process P1 says:
  a = 1;

And process P2 says some later:
  a = 2;

then it might be that both processes have for quite some time a
different value in both processors when it is referenced.

If i do however
  volatile int a;

then the compiler always LOADS the variable from memory.

So a variable which is indicating whether a piece of code is locked
HAS TO BE volatile.

  volatile int lock;

To lock a variable browse crafty a bit and probably somewhere
like chess.h or some similar file you'll find definition for
msvc for lock() and unlock.

Note that sometimes system commands for locking go to the same assembly
as in chess.h.

There are some pitfalls which you might run in, if you can't figure something
out then don's HESITATE to ask others who already experienced parallel
problems, otherwise solutions might take months!

When it works, tips to improve speedup have to do with hashtable:
  a) use several probes in the hashtable! I'm using 8 probes,
     perhaps too much for you, but if it is 1 or 2 now, try 4 probes
     with this algorithm!
  b) there is no need to lock the hashtable if you use a small trick
     as used by crafty. Note this trick requires that half of the entry
     is filled with hashkey, so it probably will eat more bytes an entry
     which for me is a major obstacle.
  c) search for smart overwrite approaches of entries. there are zillions
     of ways to
     do it. But don't use just 1 bit, that works awful for this algorithm
     during a game. 1 bit to overwrite is really very little. Use a bit or
     10 and simply store gamedepth+searchdepth or so and replace on that.

>BTW. In Paderborn, Roland Pfister also told me, that he knows this from Rudolf
>Huber, and he even started to explain it to me. Somhow, we (or me) got
>distracted, and I cannot remember the essential things.
>What I remember is, that the time consuming work, of making your
>search/evaluation routines free from all those global variables is not needed.

This page took 0.03 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.