Author: Vincent Diepeveen
Date: 17:30:15 10/23/02
Go up one level in this thread
On October 23, 2002 at 15:25:04, Colin Frayn wrote: yes i believe it. Crafty has the same problem. Diep eats any FEN/EPD position without crashing. but some years ago i had fed crafty also a FEN file and then it crashed and it crashed. I had compiled it. It took me days to debug. Then i mailed Bob and got back a day later that it was the FEN string which wasn't in EPD form. That solved all problems! A bugfree parser helps! >As most of you know, writing chess programs is fairly tricky. Weird bugs arise >to do with NULL move and hashtables that take ages to fix, especially when you >get lots of complicated algorithms all working against each other. > >So imagine my horror when Carlos at Chessbrain.net told me that Beowulf was >occasionally returning empty PV strings from searches, indicating that the >hashtable was broken (I get the PV from the hashtable directly). This annoyed >me especially as I'd not heard of this problem before and I realised I must have >broken something recently. > >I tried to verify this on my PC at home in Windows, but couldn't. Carlos tested >it and we discovered that the problem only ever occurred rarely, and only on >non-windows boxes. At this stage I was thinking 'compiler error?' or perhaps >'memory leak?' Dreaded memory leaks take ages to find. > >Anyway, I did a *lot* of testing, with the fabled bug proving exceptionally >elusive. Often it would fail for a few goes and then fix itself randomly. One >time I was trying to get the bug to work on this one position, and it did so for >the majority of the day, and then the following day that position was absolutely >fine - no probem at all, whatever I did. I began to wonder if it was something >to do with the random number code, which would be seeded differently every time >the program was run. I replaced it all. > >Carlos provided me with a new position that failed and so I started to try the >debugging. Eventually I found out that the root position was not being updated >properly. I altered the hash replacement scheme, altered the hash update >scheme, changed loads of things around in the search function, decided to store >the full 64-bit hash key instead of just a 40 bit safe key. basically, I spent >ages trying to work out what was wrong, but still no luck. > >After a *lot* of testing, I finally managed to track the bug down to the fact >that the hash key was becoming corrupted at some point during the search. I >began to test the DoMove() function, and also a few other things that could have >caused this. We installed electric fence and checked for memory leaks. No joy >(*sigh of relief*). Somehow the hash key stored in the Board structure (which >is continuously updated during DoMove()) had become corrupted so that it didn't >correspond to the current board position any more. > >Then I managed to add in some debugging code which quit as soon as the key >stored in the Board structure was no longer correct. I got Beo to print out the >position. I couldn't help but notice that in just three ply, black had made one >move but white had somehow moved his king about 4 squares. Castling problem? >By this time I had also replaced the entire random number generation code, and >added in debugging code all over the place to print out messages in case of >errors, but I felt I was getting closer. > >Then I suddenly realised. Carlos had been sending strings to the engine of the >form; >3r4/5b2/1k1r1p2/Np5p/4P1p1/2R1KPP1/2P4P/R7 w > >(i.e. missing off the last two dashes.) The full string should be >3r4/5b2/1k1r1p2/Np5p/4P1p1/2R1KPP1/2P4P/R7 w - - > >but of course the shortened version should still be valid. One of the positions >I had 'fixed' before started working mysteriously after I decided to add back in >those two dashes just for neatness. Another started working again for a short >while when I cut and pasted the entire line minus the end-of-line character, but >then failed again shortly after one I cut out the whole line including the EOL >char again. Of course I didn't notice these at the time because I was so fixed >on testing my hashtable. > >So what was the bug? My FEN parser screwed up when it encountered a DoS EOL >character whilst running on a non-DoS machine. It ended up interpreting it as a >totally spurious castling permission, sometimes allowing players to castle when >they shouldn't have been able to and therefore messing everything up, but >essentially at random, because the parser was just reading in the string for the >FEN past the end, and stopping when it came to a space character. If it met a >K,Q,k or q in that space it would update the castling permissions, but of course >this was random unallocated memory, so the result of this was also essentially >random, meaning that sometimes it failed, and sometimes it didn't. > >So after all of this, it was just me writing a bad parser ;) > >How annoying is that? :) >Can anyone beat this story? > >Cheers, >Col
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.