Author: Don Dailey
Date: 20:29:16 01/31/99
Go up one level in this thread
On January 31, 1999 at 21:43:25, Robert Hyatt wrote: >On January 31, 1999 at 16:54:55, Don Dailey wrote: > >>On January 31, 1999 at 11:27:23, Robert Hyatt wrote: >> >>>Don... was cleaning out some files on Friday, and I found something about >>>cilkchess I had printed somewhere in the past, a discussion about cilkchess >>>and its implementation details on a SMP platform. >>> >>>One thing stood out like a sore thumb that I wanted to ask about... You >>>mentioned that you don't 'lock the hash table at all'. And I'm wondering how >>>you make that work. >>> >>>IE suppose you have a simple implementation that requires two words, one for >>>the 'signature' and one for the 'value'. Since this is two words long, it >>>takes two instructions to store the entry and two to access the entry. What >>>do you do when the 'reader' reads the first word, but gets blocked by an >>>interrupt (or bus conflict or whatever) so that the 'writer' stores a new >>>signature _and_ value? Because now the reader will get the second word and >>>it does _not_ go with the first. So you get a wrong value. >>> >>>Harry and I spent several weeks debugging this in Cray Blitz way back, as I >>>was getting 'bad move from hash table' messages regularly, but only when >>>using more than one processor, so I was _certain_ it wasn't a real hash >>>signature collision. We tracked it down to the above problem, and put a >>>'lock' around the few instructions that load or store. IE here is what I do in >>>Crafty now: >>> >>>HashProbe: >>> >>> Lock(lock_hasha); >>> word1=htable->word1; >>> word2=htable->word2; >>> UnLock(lock_hasha); >>> >>>'word1 and word2' are local, while htable-> points to a globally shared hash >>>table. >>> >>>HashStore: >>> >>> Lock(lock_hashb); >>> htableb->word1=word1; >>> htableb->word2=word2; >>> UnLock(lock_hashb); >>> >>>If I remove the locks, I get _serious_ problems, because the signatures will >>>match, but the 'best move' will fail validation. Which also means that the >>>score, the draft, the threat flags, etc are _all_ wrong for this position. >>> >>>Do you fix this? Do you even detect this? Or do you just ignore it, which >>>would seem to be impossible based on tests with CB and Crafty??? >>> >>>Bob >> >>HI Bob, >> >>This one has been bothering me for a long time. But we do absolutely >>nothing. We are completely aware of the possible difficulties but I >>have never traced any problems to it. >> > >do you grab the hash move and verify it? This is where I found the problem >in Cray Blitz (I never had it in Crafty because after the very difficult task >of debugging that little oversight, I have never forgotten to do the lock >again.... Testing the move is like adding 2 or 3 bits (I'm guessing, but it's some small number of bits) to the hash key. It's a really good way to measure false matches in the hash table. If you can spare about 3 bits or so, you can pick up the majority of conflicts in the hash table. With 3 bits, you will identify a false match almost 90% of the time. It's a neat trick and testing the move is pretty much the same idea, except you get the bits for free since you store the move anyway. But it turns out that I don't test the move, I generate all the moves and push this one to the front if I run across it, similar to how one might do the killers. I could trivially do your test though and probably should. Technically, I should probably try the hash move before doing any work but I never found this to be worth anything for me. Is this what you do? I think it's because there is rarely a hash move near enough to the end nodes to do you much good. --- snipped my stuff ---- >I have no idea how many such glitches it takes to change a move at the root. >And I didn't catch the bug that way, I caught it because the hash table move >kept failing to 'verify' and I couldn't see how that could possibly happen >unless we were getting two identical hash signatures for two different >positions. My analysis convinced me this wasn't happening, because I could >search forever with 1 processor with no problems, but with more than 1, it would >produce errors, more processors, more errors... Thanks for alerting me to this. I'll do a test like you did and report the results. >>I have never felt very comfortable with this and am surprised that I >>don't see a lot more trouble. Mabye Cilkchess is 100 points weaker >>because of this I don't really know. It's silly that we haven't >>just done the lock because it's trivial to implement. The issue of >>course is will there be a substantial performance penalty. There >>may be when we use a lot of processors. I may see more problems >>when I use more processors too. >> >>What have you noticed concerning performance penalties? I know you >>do not hash into the quies search so it might be less of an issue >>for you. We can always back off an extra level or two to make up >>for the performance hit if it's substantial. > > >I am doing a 'lock' followed by four consecutive 32 bit load instructions, >followed by an unlock. Or a lock followed by four consecutive store >instructions, followed by the unlock. I don't notice _any_ performance issue >here at all, considering that these machines (my xeon/400) has a clock cycle >of 2.5ns, can execute 2 instructions per cycle best case, which gives me a >potential of 800M instructions per second. I am doing about 200K nodes per >second, which translates into about 4000 instructions/node. Those 4 load or >store instructions are a small fraction, so that serialization doesn't hurt, >until maybe large numbers of processors... but then I would probably go to a >lock for 'chunks' of the hash table rather than one lock for each of the two >tables I have... ---- more snips ------ >>There is also another thing that might be a factor. The real issue >>when it come to any hash table conflict should not be how often it >>affects some node, but how often does it change the root move? We >>figured out that this is worth several bits. I did a 32 bit key >>version of a previous program and compared it to a 64 bit key version >>by a lot of self test games and found that I could not observe that >>the program was weaker despite the fact that it was computed that we >>should be getting an illegal match at least once every second or two. > > >While I agree, are you willing to put the final round of a world championship >game in the hands of "Murphy"??? Not me.. :) Actually, YES. I am horribly pragmatic, but am gradually changing as I get older. For instance if the lock slowed down the program 6 percent, but a combination of testing and my best guess judgement calls indicated that I would lose 1/2 point every 30 games because of it, I would accept the risk because it means the program is slightly stronger. I don't really feel that way any more, but I did just a few short years ago. My current attitude is to accept a little slowdown and weakening of the program for correctness and peace of mind. Also, for something like this, the difference is too small to make an accurate judgment call about anyway, I would now rather error on the safe side. But if I knew for an absolute fact that the "error" made a stronger program, I would indeed play poker with Murphy and take my chances! But any tournament is up to Murphy anyway. All you can ever do is make your program as strong as possible (to give you the very best chance of winning) and the rest is a gamble. If you played the last world championship over a million times, even the weakest programs would win a few of them by sheer luck. If I have to play against Crafty in June, I'm going to be superstitious now and turn on locking for our game, even if I don't turn it on for the others. It would really be embarassing after this conversation to lose to you for this particular reason. I will balance the possibility of extreme embarassment against the slight weakening that a perfectly correct program might give me! I think many other programmers are more pragmatic than either of us. I remember that Marty used to allocate his time at important tournaments right down to the second. He played a game against us where he made his last move of the time control at the last possible second. It was clear after talking to him about this that he was just being pragmatic, a few extra seconds for thinking was balanced (in his mind) against the possibility of losing due to time forfeit. He probably extracted at least 1 or 2 rating point for this over us. And we (me and Larry) also liked to cut it close but not quite as much as Marty did. - Don
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.