Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: MTD(f) (was Re: New(?) search idea.)

Author: Don Dailey

Date: 16:46:40 01/23/98

Go up one level in this thread


>I would not say "trivial"... :)

I wrote Cilkchess first as a serial program and got it debugged (ok, it
turned out
it was not so well debugged!)    It took me about an hour to port it to
Cilk.
However keep in mind I knew it would be a parallel program.




>>Aske Plaat ported an old version of Crafty to Cilk.  Since Crafty
>>was designed to be a serial program he had some problems with all
>>the global state.  But even this did not prevent him from porting
>>it in a few days (even though he was completely unfamiliar with the
>>code.)   If anyone decides to write a parallel program I highly
>>recommend encapsulating all your position state (the board and
>>relevant state) into a structure.   If you are concerned that
>>creating new states for each node will be too expensive, don't
>>be.  Modern processors copy contiguous data extremely rapidly
>>and you avoid some work with keeping unmake information (you never
>>unmake a move.)   The same code (a big benefit of using cilk) will
>>produce a serial version of the program that runs very fast and I
>>promise it will be competitive with your current serial program.
>>(I think I hear someone in the background screaming something about
>>data cache but I can't quite make out what they're saying.)
>
>you need to run on a PC.  I once did this (copy state) and the best I
>could do was 5K nodes per sec on a P5/133...  when I got rid of the
>state
>copy and started make/unmake, I eventually got past 30K.  The PC only
>has a sustainable bandwidth of 60-70mb/second.  Compared to 32 gigabytes
>per second on a Cray.  Or maybe a gig/sec on your CM-5.  You will get
>killed copying large structures on a PC however...

This sounds way off to me but I don't know you tried to store.   You are
claiming
more than 6 to 1 difference for copying state.   This directly implies I
would need
about 8 processors to equal the performance of one serial processor with
make/unmake.
This means I would be twice as fast if I had NOT run on a 4 SMP's and
went back to
make/unmake.

When I started using this technique (to facilitate parallel programming)
I was extremely
concerned about this issue.   I pretty much resigned myself to biting
the bullet and
accepting a big slowdown for the greater benefit.    But when I actually
had a finished
product and looked at the numbers (and compared to my previous serial
program with
make/unmake) I was flabbergasted.  I could not tell the difference.  And
this was on
the intel platform using linux.     We had all these hacks in to reduce
the state size
too, we were using character variables for the board and so on.  But we
later figured
out all of this crap was not neccessary,  a few dozen bytes of state
copied per move
is nothing.   We took out the hacks and it sped up slightly!

I have no idea how to resolve this with your experience of 6 to 1 since
making moves
is less than 5% of our overhead and they include the state copy.

But you said something about copying large structures on a PC.   I am
not copying
large structures (there is no need.)   Just the board itself and a few
variables.

But even if someone thought this might be a big problem,  it's trivial
to solve.  at any
node near the end nodes drop the parallelism (our quies is not parallel)
and update
the state in place with make/unmake.   Even though our quies is not
parallel I still
do the state copies simply because it's no slower than make/unmake.

One very nice side benefit of the position structure (other than never
having to unmake
a move which itself is a minor speedup)  is that your code becomes much
cleaner.

- Don



hurt things much at all)  and update the state in




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.