Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: quarkx v monsoon-ccct4

Author: Scott Gasch

Date: 19:16:08 01/20/02

Go up one level in this thread


On January 20, 2002 at 19:48:36, Claudio Della Corte wrote:

>On January 20, 2002 at 17:24:56, Scott Gasch wrote:
>
>[...]
>>Move 118 is the bug.  No idea what it was, looks to be some kind of hash
>>problem.  I can't get it to reproduce.
>
>It seems pretty much a bug due to TB, that must be handled carefully if not X
>men complete. Just a guess.
>Claudio

Well I'm on the hunt tonight.  I guess I did a stupid thing shortly after this
happened I decided monsoon was totally messed up and I needed to kill it /
restart before the next round.  Well that overwrote the logfile with the
evidence in it... dumb.  So the first thing I've done is get rid of my stupid
code that overwrites logfiles and make it do a numbering scheme.  A little late
huh?

I can't reproduce the move by inputting the PGN or just running the FEN.  I also
tried running the positions in that game near the blunder with a full paranoid
build (which is about 100nps because of all the stuff it checks) and come up
with nothing.

So I am left to speculate here.  My first instinct is a hash bug so I've looked
over my hash code very carefully, added a bunch of asserts, etc.  I think I may
have found a problem and I've got an int 3 on it.. if it happens I'll know.

Next thing is the egtb files themselves.  This is where I could use some help.
I now am starting to see the reason for Bruce's "paranoia" about code he didn't
write... I turned on TB_CRC_CHECK in my code that probes Eugene's tables as well
as in eugene's egtb.cpp.  No CRC problems to report.  I am going to grab a
source version of crafty and make sure this egtb code hasn't changed since I
stole it and merged it into my engine.

The last thing I can come up with is a bit flip in memory.  Yes you think I am
crazy but debugging crashed kernels at work I have seen this before, albeit
rare.  I've got a machine in my office where I can tell you the physical address
it happens at and which bit will get asserted.  Anyway, I have a stupid little
memory check utility (it locks a huge buffer in physical memory and runs pattern
tests on it) that I wrote will run on monsoon's hardware overnight.  I don't
expect to find anything... for one I can't get all the memory (drivers / kernel
need some though I can get abot 85% of it) and for two the memory system of my
PC is a hell of alot better tested than my chess engine.  So this is a shot in
the dark.

I also went to every place in my code where I typecast something and double
checked the asserts above the cast.  I was thinking I could have dropped a sign
bit in a cast or something but this seems unlikely now.

I'm more than open to suggestions about what else to try -- I'd be grateful for
any ideas.

Thanks,
Scott



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.