Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: CilkChess question for Don

Author: Don Dailey

Date: 20:29:16 01/31/99

Go up one level in this thread


On January 31, 1999 at 21:43:25, Robert Hyatt wrote:

>On January 31, 1999 at 16:54:55, Don Dailey wrote:
>
>>On January 31, 1999 at 11:27:23, Robert Hyatt wrote:
>>
>>>Don...  was cleaning out some files on Friday, and I found something about
>>>cilkchess I had printed somewhere in the past, a discussion about cilkchess
>>>and its implementation details on a SMP platform.
>>>
>>>One thing stood out like a sore thumb that I wanted to ask about...  You
>>>mentioned that you don't 'lock the hash table at all'.  And I'm wondering how
>>>you make that work.
>>>
>>>IE suppose you have a simple implementation that requires two words, one for
>>>the 'signature' and one for the 'value'.  Since this is two words long, it
>>>takes two instructions to store the entry and two to access the entry. What
>>>do you do when the 'reader' reads the first word, but gets blocked by an
>>>interrupt (or bus conflict or whatever) so that the 'writer' stores a new
>>>signature _and_ value?  Because now the reader will get the second word and
>>>it does _not_ go with the first.  So you get a wrong value.
>>>
>>>Harry and I spent several weeks debugging this in Cray Blitz way back, as I
>>>was getting 'bad move from hash table' messages regularly, but only when
>>>using more than one processor, so I was _certain_ it wasn't a real hash
>>>signature collision.  We tracked it down to the above problem, and put a
>>>'lock' around the few instructions that load or store.  IE here is what I do in
>>>Crafty now:
>>>
>>>HashProbe:
>>>
>>>  Lock(lock_hasha);
>>>  word1=htable->word1;
>>>  word2=htable->word2;
>>>  UnLock(lock_hasha);
>>>
>>>'word1 and word2' are local, while htable-> points to a globally shared hash
>>>table.
>>>
>>>HashStore:
>>>
>>>    Lock(lock_hashb);
>>>    htableb->word1=word1;
>>>    htableb->word2=word2;
>>>    UnLock(lock_hashb);
>>>
>>>If I remove the locks, I get _serious_ problems, because the signatures will
>>>match, but the 'best move' will fail validation.  Which also means that the
>>>score, the draft, the threat flags, etc are _all_ wrong for this position.
>>>
>>>Do you fix this?  Do you even detect this?  Or do you just ignore it, which
>>>would seem to be impossible based on tests with CB and Crafty???
>>>
>>>Bob
>>
>>HI Bob,
>>
>>This one has been bothering me for a long time.  But we do absolutely
>>nothing.  We are completely aware of the possible difficulties but I
>>have never traced any problems to it.
>>
>
>do you grab the hash move and verify it?  This is where I found the problem
>in Cray Blitz (I never had it in Crafty because after the very difficult task
>of debugging that little oversight, I have never forgotten to do the lock
>again....

Testing the move is like adding 2 or 3 bits (I'm guessing, but it's
some small number of bits) to the hash key.  It's a really good way
to measure false matches in the hash table. If you can spare about 3
bits or so, you can pick up the majority of conflicts in the hash table.
With 3 bits, you will identify a false match almost 90% of the time.
It's a neat trick and testing the move is pretty much the same
idea, except you get the bits for free since you store the move
anyway.   But it turns out that I don't test the move,  I generate
all the moves and push this one to the front if I run across it,
similar to how one might do the killers.   I could trivially do
your test though and probably should.  Technically, I should probably
try the hash move before doing any work but I never found this to be
worth anything for me.  Is this what you do?   I think it's because
there is rarely a hash move near enough to the end nodes to do you
much good.

--- snipped my stuff ----

>I have no idea how many such glitches it takes to change a move at the root.
>And I didn't catch the bug that way, I caught it because the hash table move
>kept failing to 'verify' and I couldn't see how that could possibly happen
>unless we were getting two identical hash signatures for two different
>positions.  My analysis convinced me this wasn't happening, because I could
>search forever with 1 processor with no problems, but with more than 1, it would
>produce errors, more processors, more errors...


Thanks for alerting me to this.  I'll do a test like you did and report
the results.


>>I have never felt very comfortable with this and am surprised that I
>>don't see a lot more trouble.  Mabye Cilkchess is 100 points weaker
>>because of this I don't really know.   It's silly that we haven't
>>just done the lock because it's trivial to implement.  The issue of
>>course is will there be a substantial performance penalty.  There
>>may be when we use a lot of processors.   I may see more problems
>>when I use more processors too.
>>
>>What have you noticed concerning performance penalties?   I know you
>>do not hash into the quies search so it might be less of an issue
>>for you.  We can always back off an extra level or two to make up
>>for the performance hit if it's substantial.
>
>
>I am doing a 'lock' followed by four consecutive 32 bit load instructions,
>followed by an unlock.  Or a lock followed by four consecutive store
>instructions, followed by the unlock.  I don't notice _any_ performance issue
>here at all, considering that these machines (my xeon/400) has a clock cycle
>of 2.5ns, can execute 2 instructions per cycle best case, which gives me a
>potential of 800M instructions per second.  I am doing about 200K nodes per
>second, which translates into about 4000 instructions/node.  Those 4 load or
>store instructions are a small fraction, so that serialization doesn't hurt,
>until maybe large numbers of processors... but then I would probably go to a
>lock for 'chunks' of the hash table rather than one lock for each of the two
>tables I have...

---- more snips ------

>>There is also another thing that might be a factor.  The real issue
>>when it come to any hash table conflict should not be how often it
>>affects some node, but how often does it change the root move?  We
>>figured out that this is worth several bits.   I did a 32 bit key
>>version of a previous program and compared it to a 64 bit key version
>>by a lot of self test games and found that I could not observe that
>>the program was weaker despite the fact that it was computed that we
>>should be getting an illegal match at least once every second or two.
>
>
>While I agree, are you willing to put the final round of a world championship
>game in the hands of "Murphy"???   Not me.. :)

Actually, YES.  I am horribly pragmatic, but am gradually changing as
I get older.  For instance if the lock slowed down the program 6 percent,
but a combination of testing and my best guess judgement calls indicated
that I would lose 1/2 point every 30 games because of it,  I would accept
the risk because it means the program is slightly stronger.  I don't
really feel that way any more, but I did just a few short years ago.
My current attitude is to accept a little slowdown and weakening of the
program for correctness and peace of mind.    Also, for something like
this,  the difference is too small to make an accurate judgment call
about anyway,  I would now rather error on the safe side.   But if I
knew for an absolute fact that the "error" made a stronger program,
I would indeed play poker with Murphy and take my chances!

But any tournament is up to Murphy anyway.  All you can ever do is
make your program as strong as possible (to give you the very best
chance of winning) and the rest is a gamble.  If you played the
last world championship over a million times, even the weakest
programs would win a few of them by sheer luck.

If I have to play against Crafty in June,  I'm going to be superstitious
now and turn on locking for our game, even if I don't turn it on for
the others.  It would really be embarassing after this conversation to
lose to you for this particular reason.   I will balance the possibility
of extreme embarassment against the slight weakening that a perfectly
correct program might give me!

I think many other programmers are more pragmatic than either of us.
I remember that Marty used to allocate his time at important tournaments
right down to the second.  He played a game against us where he made
his last move of the time control at the last possible second.  It was
clear after talking to him about this that he was just being pragmatic,
a few extra seconds for thinking was balanced (in his mind) against
the possibility of losing due to time forfeit.  He probably extracted
at least 1 or 2 rating point for this over us.  And we (me and Larry)
also liked to cut it close but not quite as much as Marty did.


- Don



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.