Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: CilkChess question for Don

Author: Don Dailey

Date: 13:54:55 01/31/99

Go up one level in this thread


On January 31, 1999 at 11:27:23, Robert Hyatt wrote:

>Don...  was cleaning out some files on Friday, and I found something about
>cilkchess I had printed somewhere in the past, a discussion about cilkchess
>and its implementation details on a SMP platform.
>
>One thing stood out like a sore thumb that I wanted to ask about...  You
>mentioned that you don't 'lock the hash table at all'.  And I'm wondering how
>you make that work.
>
>IE suppose you have a simple implementation that requires two words, one for
>the 'signature' and one for the 'value'.  Since this is two words long, it
>takes two instructions to store the entry and two to access the entry. What
>do you do when the 'reader' reads the first word, but gets blocked by an
>interrupt (or bus conflict or whatever) so that the 'writer' stores a new
>signature _and_ value?  Because now the reader will get the second word and
>it does _not_ go with the first.  So you get a wrong value.
>
>Harry and I spent several weeks debugging this in Cray Blitz way back, as I
>was getting 'bad move from hash table' messages regularly, but only when
>using more than one processor, so I was _certain_ it wasn't a real hash
>signature collision.  We tracked it down to the above problem, and put a
>'lock' around the few instructions that load or store.  IE here is what I do in
>Crafty now:
>
>HashProbe:
>
>  Lock(lock_hasha);
>  word1=htable->word1;
>  word2=htable->word2;
>  UnLock(lock_hasha);
>
>'word1 and word2' are local, while htable-> points to a globally shared hash
>table.
>
>HashStore:
>
>    Lock(lock_hashb);
>    htableb->word1=word1;
>    htableb->word2=word2;
>    UnLock(lock_hashb);
>
>If I remove the locks, I get _serious_ problems, because the signatures will
>match, but the 'best move' will fail validation.  Which also means that the
>score, the draft, the threat flags, etc are _all_ wrong for this position.
>
>Do you fix this?  Do you even detect this?  Or do you just ignore it, which
>would seem to be impossible based on tests with CB and Crafty???
>
>Bob

HI Bob,

This one has been bothering me for a long time.  But we do absolutely
nothing.  We are completely aware of the possible difficulties but I
have never traced any problems to it.

I did do a test to give me at lease some confidence that it was ok
to do this for the time being.  I ran a test suite of problems for
many repetitions.   I used about 20 problems,  let them run for about
2 days and this represented a LOT of iterations running the same
problems.   The problems were always solved on the same depth although
occasionally the scores varied very slightly (but this is common with
a parallel program.)  That is except for a single problem which was
known anyway to have flaky behavior on that particular version of
Cilkchess.  I even tested this by doing a simple test with a different
hash key generator (different table of random numbers.)

I have never felt very comfortable with this and am surprised that I
don't see a lot more trouble.  Mabye Cilkchess is 100 points weaker
because of this I don't really know.   It's silly that we haven't
just done the lock because it's trivial to implement.  The issue of
course is will there be a substantial performance penalty.  There
may be when we use a lot of processors.   I may see more problems
when I use more processors too.

What have you noticed concerning performance penalties?   I know you
do not hash into the quies search so it might be less of an issue
for you.  We can always back off an extra level or two to make up
for the performance hit if it's substantial.

When I do the lock test I'll inform you of the results.  What do you
actually do to observe this problem?  Do you just put in code that
observes the conflict?

There is also another thing that might be a factor.  The real issue
when it come to any hash table conflict should not be how often it
affects some node, but how often does it change the root move?  We
figured out that this is worth several bits.   I did a 32 bit key
version of a previous program and compared it to a 64 bit key version
by a lot of self test games and found that I could not observe that
the program was weaker despite the fact that it was computed that we
should be getting an illegal match at least once every second or two.

It may be similar with this Bob.  Pehaps we are getting a relatively
high number of these bad matches but maybe it has relatively little
affect on the performance?   If this is true, then we might actually
take a significant performance penalty for "fixing" the problem!

It's an ugly thought, isn't it?   I think this might be a case where
peace of mind is more important than a couple of rating points!



- Don



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.