Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Parallel algorithms in chess programming

Author: Vincent Diepeveen
Date: 04:50:56 04/17/01
On April 16, 2001 at 21:13:20, Bas Hamstra wrote:

>On April 16, 2001 at 18:45:09, Vincent Diepeveen wrote:
>
>>On April 16, 2001 at 18:15:52, Dieter Buerssner wrote:
>>
>>>In a different discussion, Vincent wrote the following:
>>>
>>>>It is not difficult to implement the form of parallellism as used by
>>>>Rudolf. Invented by a frenchman who couldn't spell a word english and
>>>>who wrote an impossible article for JICCA (did anyone proofread it at
>>>>the time as i'm pretty sure they didn't get his parallel idea?).
>>>>
>>>>At the time when i read the article i was pathetically laughing about it
>>>>actually as i also didn't get the idea of the frenchman. But it appears
>>>>everyone who can make a chessprogram work under win2000 can also get
>>>>within an afternoon his program parallel to work. Then some debugging
>>>>and a day later it works cool.
>>>
>>>I'd be very interested in this algorithm, that can be implemented at an
>>>afternoon :-)
>>>
>>>Could you point elaborate on this.
>>
>>Answer is: YES...
>>
>>Ok i'll give a small practical description here. For source code
>>to implement some features or in depth questions email me.
>>
>>There are several forms of parallellism and from the
>>shared memory parallellisms a very important difference is:
>>  a) dividing the search over the different processors
>>  b) completely independant searches which get speeded up by
>>     a side effect
>>
>>Nearly all papers are about a. Of course it took a frenchman
>>to find a cheapo algorithm for b.
>>
>>God knows who proofread that article of the frenchman, probably
>>the proofreader was only Jaap v/d Herik and he probably only corrected
>>spelling of english as most likely the frenchman couldn't spell english
>>at all.
>>
>>Anyway, Rudolf Huber managed to read it luckily.
>>
>>The idea is pretty straightforward. Just start your program which we
>>will call P1 from process 1.
>>
>>It is completely the same as normal. No hard changes.
>>
>>Secondly we start a second program P2. This is nearly the exact same
>>program as P2 except it can't do i/o. This program is only analysing
>>for example.
>>
>>To not let P2 always analyze the root position we give moves made
>>by P1 to P2. Understand me well we talk about moves made in the GAME,
>>not in the search.
>>
>>So that's pretty little communication from P1 to P2. Probably smart
>>is also to give it expected moves by P1 and newgame commands.
>>
>>But that's about all you need to give from P1 to P2.
>>
>>Now where is the speedup?
>>
>>Well in the description as above so far there is no speedup at all.
>>The speedup comes from a small side effect. Both P1 and P2 share the
>>hashtable!
>>
>>So sometimes P1 will be speeded up by transpositiontable cutoffs which
>>P2 has neatly stored for it in the hashtable.
>>
>>Even the most deterministic program in the world will be speeded up
>>bigtime by this, as after a few microseconds already both processes
>>will not search and evaluate the same positions at the exact same time.
>>
>>Expected speedup of this algorithm should be around 1.6
>>
>>What do you need to modify to your engine to get it to work?
>>
>>Steps:
>>  a) let p1 start P2 (initially you might want to use
>>     _spawnl(), later when it works ok CreateProcess() )
>>     Let P2 follow all things P1 is doing and also
>>     not let P2 ask for keyboard input.
>>     Even not shipping predicted moves is more than ok. just
>>     filling hashtable is all what P2 needs to do.
>>  b) create a shared hashtable and lock it before you fill it
>>
>>to talk more about locking hashtable and shipping moves
>>to the other process: just like sharing
>>hashtable you can share another piece of memory to do all this.
>>
>>Very important thing for shared memory is to know how volatile
>>variables work.
>>
>>if i have a global variable that is called:
>>  int a;
>>
>>And process P1 says:
>>  a = 1;
>>
>>And process P2 says some later:
>>  a = 2;
>>
>>then it might be that both processes have for quite some time a
>>different value in both processors when it is referenced.
>>
>>If i do however
>>  volatile int a;
>>
>>then the compiler always LOADS the variable from memory.
>>
>>So a variable which is indicating whether a piece of code is locked
>>HAS TO BE volatile.
>>
>>  volatile int lock;
>>
>>To lock a variable browse crafty a bit and probably somewhere
>>like chess.h or some similar file you'll find definition for
>>msvc for lock() and unlock.
>>
>>Note that sometimes system commands for locking go to the same assembly
>>as in chess.h.
>>
>>There are some pitfalls which you might run in, if you can't figure something
>>out then don's HESITATE to ask others who already experienced parallel
>>problems, otherwise solutions might take months!
>>
>>When it works, tips to improve speedup have to do with hashtable:
>>  a) use several probes in the hashtable! I'm using 8 probes,
>>     perhaps too much for you, but if it is 1 or 2 now, try 4 probes
>>     with this algorithm!
>>  b) there is no need to lock the hashtable if you use a small trick
>>     as used by crafty. Note this trick requires that half of the entry
>>     is filled with hashkey, so it probably will eat more bytes an entry
>>     which for me is a major obstacle.
>>  c) search for smart overwrite approaches of entries. there are zillions
>>     of ways to
>>     do it. But don't use just 1 bit, that works awful for this algorithm
>>     during a game. 1 bit to overwrite is really very little. Use a bit or
>>     10 and simply store gamedepth+searchdepth or so and replace on that.
>>
>>>BTW. In Paderborn, Roland Pfister also told me, that he knows this from Rudolf
>>>Huber, and he even started to explain it to me. Somhow, we (or me) got
>>>distracted, and I cannot remember the essential things.
>>>
>>>What I remember is, that the time consuming work, of making your
>>>search/evaluation routines free from all those global variables is not needed.
>>>
>>>Regards,
>>>Dieter
>
>- Do you really get 1.6? Because Roland Pfister told me he got 1.2 with this.
>- Maybe ETC will help also here? At lower depths trying to play all moves to get
>a cutoff?
>
>Bas.

I am using a different algorithm, some improvements upon Cray Blitz parallellism
from Robert Hyatt as published in journal of ICCA.

I get 2.0 with it at 3 minutes a move (overhead is 4% but better
branching factor compensates for it).

That took however a year of work. The above algorithm i didn't check
speedup of it, but the speedup one gets with an algorithm in general
is heavily dependant upon the search.

If your search is not extending much and taking the above tips into
account (so many probes etcetera) then definitely 1.6 is what you should
get.

To get 1.7 or more like crafty gets you need to do a load of work, so
for those who are out for a quick speedup, getting more is difficult.

I heart 1.4 from Roland at IPCCC2001. Of course speedup also
depends upon the overhead of an algorithm.

Imagine the worst case behaviour of your program if single cpu.
I'm pretty sure that this algorithm in general improves it there!

As far as i know SOS gets around 1.6 out of this algorithm.

I do not know how many extensions Roland is doing, but if he does a lot
then it is very well possible that his speedup is less as your speedup
will be.

Also i have no clue how many probes he does in hashtable.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.