Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: quarkx v monsoon-ccct4

Author: Scott Gasch

Date: 10:22:21 01/21/02

Go up one level in this thread


On January 21, 2002 at 10:58:12, Ulrich Tuerke wrote:

>On January 20, 2002 at 22:16:08, Scott Gasch wrote:
>
>>On January 20, 2002 at 19:48:36, Claudio Della Corte wrote:
>>
>>>On January 20, 2002 at 17:24:56, Scott Gasch wrote:
>>>
>>>[...]
>>>>Move 118 is the bug.  No idea what it was, looks to be some kind of hash
>>>>problem.  I can't get it to reproduce.
>>>
>>>It seems pretty much a bug due to TB, that must be handled carefully if not X
>>>men complete. Just a guess.
>>>Claudio
>>
>>Well I'm on the hunt tonight.  I guess I did a stupid thing shortly after this
>>happened I decided monsoon was totally messed up and I needed to kill it /
>>restart before the next round.  Well that overwrote the logfile with the
>>evidence in it... dumb.  So the first thing I've done is get rid of my stupid
>>code that overwrites logfiles and make it do a numbering scheme.  A little late
>>huh?
>>
>>I can't reproduce the move by inputting the PGN or just running the FEN.  I also
>>tried running the positions in that game near the blunder with a full paranoid
>>build (which is about 100nps because of all the stuff it checks) and come up
>>with nothing.
>>
>>So I am left to speculate here.  My first instinct is a hash bug so I've looked
>>over my hash code very carefully, added a bunch of asserts, etc.  I think I may
>>have found a problem and I've got an int 3 on it.. if it happens I'll know.
>>
>>Next thing is the egtb files themselves.  This is where I could use some help.
>>I now am starting to see the reason for Bruce's "paranoia" about code he didn't
>>write... I turned on TB_CRC_CHECK in my code that probes Eugene's tables as well
>>as in eugene's egtb.cpp.  No CRC problems to report.  I am going to grab a
>>source version of crafty and make sure this egtb code hasn't changed since I
>>stole it and merged it into my engine.
>>
>>The last thing I can come up with is a bit flip in memory.  Yes you think I am
>>crazy but debugging crashed kernels at work I have seen this before, albeit
>>rare.  I've got a machine in my office where I can tell you the physical address
>>it happens at and which bit will get asserted.  Anyway, I have a stupid little
>>memory check utility (it locks a huge buffer in physical memory and runs pattern
>>tests on it) that I wrote will run on monsoon's hardware overnight.  I don't
>>expect to find anything... for one I can't get all the memory (drivers / kernel
>>need some though I can get abot 85% of it) and for two the memory system of my
>>PC is a hell of alot better tested than my chess engine.  So this is a shot in
>>the dark.
>>
>>I also went to every place in my code where I typecast something and double
>>checked the asserts above the cast.  I was thinking I could have dropped a sign
>>bit in a cast or something but this seems unlikely now.
>>
>>I'm more than open to suggestions about what else to try -- I'd be grateful for
>>any ideas.
>
>This reminds me on some problem which I had some time ago.
>The problem was related to storing mate values into the hash table.
>Usually, you add some increment ("ply" or so) to the mate value to correct for
>the distance to the root of the tree. Furthermore, you probably store a flag
>indicating whether it's a bound or an exact value.
>When storing the flag I compared the modified mate value against alpha and beta.
>This was of course wrong; i had to compare the original mate value.
>
>Just an idea,
>Uli
>

Thanks for the ideas everyone.

My first instinct with this is a hash bug too but what Uli described does not
happen in my code.  For mate bounds I simply convert then to NMATE bounds.  For
exact mates I did not fall into this trap, I adjust for ply and store/retrieve.

I noticed the score of the engine go from +MATE24 to +30 on the fateful move.
This is very strange, there should have been an number of simple 6-7 ply lines
that ended in a tablebase mate.  Therefore I think a hash collision can't be to
blame -- there should have been multiple winning lines in the hash and +30 is
way less than +MATE24.  The engine also avoided trading queens (which was
winning) for three moves.  A simple bit flip in memory can't be the explaination
as there should have been more than one line and thus more than one position in
the hash that showed a winning path after the queentrade.

As of this time I'm still hunting and have not been able to reproduce.

Scott



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.