Author: Andrew Williams
Date: 03:41:40 01/21/02
Go up one level in this thread
On January 20, 2002 at 22:16:08, Scott Gasch wrote: >On January 20, 2002 at 19:48:36, Claudio Della Corte wrote: > >>On January 20, 2002 at 17:24:56, Scott Gasch wrote: >> >>[...] >>>Move 118 is the bug. No idea what it was, looks to be some kind of hash >>>problem. I can't get it to reproduce. >> >>It seems pretty much a bug due to TB, that must be handled carefully if not X >>men complete. Just a guess. >>Claudio > >Well I'm on the hunt tonight. I guess I did a stupid thing shortly after this >happened I decided monsoon was totally messed up and I needed to kill it / >restart before the next round. Well that overwrote the logfile with the >evidence in it... dumb. So the first thing I've done is get rid of my stupid >code that overwrites logfiles and make it do a numbering scheme. A little late >huh? > >I can't reproduce the move by inputting the PGN or just running the FEN. I also >tried running the positions in that game near the blunder with a full paranoid >build (which is about 100nps because of all the stuff it checks) and come up >with nothing. > >So I am left to speculate here. My first instinct is a hash bug so I've looked >over my hash code very carefully, added a bunch of asserts, etc. I think I may >have found a problem and I've got an int 3 on it.. if it happens I'll know. > >Next thing is the egtb files themselves. This is where I could use some help. >I now am starting to see the reason for Bruce's "paranoia" about code he didn't >write... I turned on TB_CRC_CHECK in my code that probes Eugene's tables as well >as in eugene's egtb.cpp. No CRC problems to report. I am going to grab a >source version of crafty and make sure this egtb code hasn't changed since I >stole it and merged it into my engine. > >The last thing I can come up with is a bit flip in memory. Yes you think I am >crazy but debugging crashed kernels at work I have seen this before, albeit >rare. I've got a machine in my office where I can tell you the physical address >it happens at and which bit will get asserted. Anyway, I have a stupid little >memory check utility (it locks a huge buffer in physical memory and runs pattern >tests on it) that I wrote will run on monsoon's hardware overnight. I don't >expect to find anything... for one I can't get all the memory (drivers / kernel >need some though I can get abot 85% of it) and for two the memory system of my >PC is a hell of alot better tested than my chess engine. So this is a shot in >the dark. > >I also went to every place in my code where I typecast something and double >checked the asserts above the cast. I was thinking I could have dropped a sign >bit in a cast or something but this seems unlikely now. > >I'm more than open to suggestions about what else to try -- I'd be grateful for >any ideas. > >Thanks, >Scott In the past when I've had "mystery" moves appear on the board, it has often been because of an error in picking up my PV. I get my PV from my hash-table which is probably different from how you do it, but it might be relatively easy to check this code rather than trying to re-create a hash-table bug from 120 moves into a game. Speaking of which, there are no move-list problems here? Obviously, monsoon plays a lot on ICC so it's unlikely, but 120 moves is a lot. I for one would be very interested to hear about anything you find... Andrew
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.