Author: Robert Hyatt
Date: 18:43:25 01/31/99
Go up one level in this thread
On January 31, 1999 at 16:54:55, Don Dailey wrote: >On January 31, 1999 at 11:27:23, Robert Hyatt wrote: > >>Don... was cleaning out some files on Friday, and I found something about >>cilkchess I had printed somewhere in the past, a discussion about cilkchess >>and its implementation details on a SMP platform. >> >>One thing stood out like a sore thumb that I wanted to ask about... You >>mentioned that you don't 'lock the hash table at all'. And I'm wondering how >>you make that work. >> >>IE suppose you have a simple implementation that requires two words, one for >>the 'signature' and one for the 'value'. Since this is two words long, it >>takes two instructions to store the entry and two to access the entry. What >>do you do when the 'reader' reads the first word, but gets blocked by an >>interrupt (or bus conflict or whatever) so that the 'writer' stores a new >>signature _and_ value? Because now the reader will get the second word and >>it does _not_ go with the first. So you get a wrong value. >> >>Harry and I spent several weeks debugging this in Cray Blitz way back, as I >>was getting 'bad move from hash table' messages regularly, but only when >>using more than one processor, so I was _certain_ it wasn't a real hash >>signature collision. We tracked it down to the above problem, and put a >>'lock' around the few instructions that load or store. IE here is what I do in >>Crafty now: >> >>HashProbe: >> >> Lock(lock_hasha); >> word1=htable->word1; >> word2=htable->word2; >> UnLock(lock_hasha); >> >>'word1 and word2' are local, while htable-> points to a globally shared hash >>table. >> >>HashStore: >> >> Lock(lock_hashb); >> htableb->word1=word1; >> htableb->word2=word2; >> UnLock(lock_hashb); >> >>If I remove the locks, I get _serious_ problems, because the signatures will >>match, but the 'best move' will fail validation. Which also means that the >>score, the draft, the threat flags, etc are _all_ wrong for this position. >> >>Do you fix this? Do you even detect this? Or do you just ignore it, which >>would seem to be impossible based on tests with CB and Crafty??? >> >>Bob > >HI Bob, > >This one has been bothering me for a long time. But we do absolutely >nothing. We are completely aware of the possible difficulties but I >have never traced any problems to it. > do you grab the hash move and verify it? This is where I found the problem in Cray Blitz (I never had it in Crafty because after the very difficult task of debugging that little oversight, I have never forgotten to do the lock again.... >I did do a test to give me at lease some confidence that it was ok >to do this for the time being. I ran a test suite of problems for >many repetitions. I used about 20 problems, let them run for about >2 days and this represented a LOT of iterations running the same >problems. The problems were always solved on the same depth although >occasionally the scores varied very slightly (but this is common with >a parallel program.) That is except for a single problem which was >known anyway to have flaky behavior on that particular version of >Cilkchess. I even tested this by doing a simple test with a different >hash key generator (different table of random numbers.) I have no idea how many such glitches it takes to change a move at the root. And I didn't catch the bug that way, I caught it because the hash table move kept failing to 'verify' and I couldn't see how that could possibly happen unless we were getting two identical hash signatures for two different positions. My analysis convinced me this wasn't happening, because I could search forever with 1 processor with no problems, but with more than 1, it would produce errors, more processors, more errors... > >I have never felt very comfortable with this and am surprised that I >don't see a lot more trouble. Mabye Cilkchess is 100 points weaker >because of this I don't really know. It's silly that we haven't >just done the lock because it's trivial to implement. The issue of >course is will there be a substantial performance penalty. There >may be when we use a lot of processors. I may see more problems >when I use more processors too. > >What have you noticed concerning performance penalties? I know you >do not hash into the quies search so it might be less of an issue >for you. We can always back off an extra level or two to make up >for the performance hit if it's substantial. I am doing a 'lock' followed by four consecutive 32 bit load instructions, followed by an unlock. Or a lock followed by four consecutive store instructions, followed by the unlock. I don't notice _any_ performance issue here at all, considering that these machines (my xeon/400) has a clock cycle of 2.5ns, can execute 2 instructions per cycle best case, which gives me a potential of 800M instructions per second. I am doing about 200K nodes per second, which translates into about 4000 instructions/node. Those 4 load or store instructions are a small fraction, so that serialization doesn't hurt, until maybe large numbers of processors... but then I would probably go to a lock for 'chunks' of the hash table rather than one lock for each of the two tables I have... > >When I do the lock test I'll inform you of the results. What do you >actually do to observe this problem? Do you just put in code that >observes the conflict? I can comment out the lock/unlocks and I get lots of 'bad move from hash table' errors when the hash move fails validation... > >There is also another thing that might be a factor. The real issue >when it come to any hash table conflict should not be how often it >affects some node, but how often does it change the root move? We >figured out that this is worth several bits. I did a 32 bit key >version of a previous program and compared it to a 64 bit key version >by a lot of self test games and found that I could not observe that >the program was weaker despite the fact that it was computed that we >should be getting an illegal match at least once every second or two. While I agree, are you willing to put the final round of a world championship game in the hands of "Murphy"??? Not me.. :) > >It may be similar with this Bob. Pehaps we are getting a relatively >high number of these bad matches but maybe it has relatively little >affect on the performance? If this is true, then we might actually >take a significant performance penalty for "fixing" the problem! > >It's an ugly thought, isn't it? I think this might be a case where >peace of mind is more important than a couple of rating points! > > > >- Don
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.