Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: quarkx v monsoon-ccct4

Author: Andrew Williams

Date: 03:41:40 01/21/02

Go up one level in this thread


On January 20, 2002 at 22:16:08, Scott Gasch wrote:

>On January 20, 2002 at 19:48:36, Claudio Della Corte wrote:
>
>>On January 20, 2002 at 17:24:56, Scott Gasch wrote:
>>
>>[...]
>>>Move 118 is the bug.  No idea what it was, looks to be some kind of hash
>>>problem.  I can't get it to reproduce.
>>
>>It seems pretty much a bug due to TB, that must be handled carefully if not X
>>men complete. Just a guess.
>>Claudio
>
>Well I'm on the hunt tonight.  I guess I did a stupid thing shortly after this
>happened I decided monsoon was totally messed up and I needed to kill it /
>restart before the next round.  Well that overwrote the logfile with the
>evidence in it... dumb.  So the first thing I've done is get rid of my stupid
>code that overwrites logfiles and make it do a numbering scheme.  A little late
>huh?
>
>I can't reproduce the move by inputting the PGN or just running the FEN.  I also
>tried running the positions in that game near the blunder with a full paranoid
>build (which is about 100nps because of all the stuff it checks) and come up
>with nothing.
>
>So I am left to speculate here.  My first instinct is a hash bug so I've looked
>over my hash code very carefully, added a bunch of asserts, etc.  I think I may
>have found a problem and I've got an int 3 on it.. if it happens I'll know.
>
>Next thing is the egtb files themselves.  This is where I could use some help.
>I now am starting to see the reason for Bruce's "paranoia" about code he didn't
>write... I turned on TB_CRC_CHECK in my code that probes Eugene's tables as well
>as in eugene's egtb.cpp.  No CRC problems to report.  I am going to grab a
>source version of crafty and make sure this egtb code hasn't changed since I
>stole it and merged it into my engine.
>
>The last thing I can come up with is a bit flip in memory.  Yes you think I am
>crazy but debugging crashed kernels at work I have seen this before, albeit
>rare.  I've got a machine in my office where I can tell you the physical address
>it happens at and which bit will get asserted.  Anyway, I have a stupid little
>memory check utility (it locks a huge buffer in physical memory and runs pattern
>tests on it) that I wrote will run on monsoon's hardware overnight.  I don't
>expect to find anything... for one I can't get all the memory (drivers / kernel
>need some though I can get abot 85% of it) and for two the memory system of my
>PC is a hell of alot better tested than my chess engine.  So this is a shot in
>the dark.
>
>I also went to every place in my code where I typecast something and double
>checked the asserts above the cast.  I was thinking I could have dropped a sign
>bit in a cast or something but this seems unlikely now.
>
>I'm more than open to suggestions about what else to try -- I'd be grateful for
>any ideas.
>
>Thanks,
>Scott

In the past when I've had "mystery" moves appear on the board, it has often
been because of an error in picking up my PV. I get my PV from my hash-table
which is probably different from how you do it, but it might be relatively
easy to check this code rather than trying to re-create a hash-table bug from
120 moves into a game. Speaking of which, there are no move-list problems here?
Obviously, monsoon plays a lot on ICC so it's unlikely, but 120 moves is a lot.

I for one would be very interested to hear about anything you find...


Andrew



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.