Author: Pat King
Date: 07:13:50 02/10/01
Go up one level in this thread
On February 10, 2001 at 02:34:35, Dan Newman wrote: >On February 09, 2001 at 19:34:03, Ricardo Gibert wrote: > >>On February 09, 2001 at 19:04:45, David Rasmussen wrote: >> >>>On February 09, 2001 at 10:39:09, Pat King wrote: >>> >>>>On February 07, 2001 at 10:59:31, Pat King wrote: >>>> >>>>>I have seen it written here that with 64 bit Zobrist hashing, the perfect key >>>>>should change 32 bits. When I had what I thought to be hashing problems, I >>>>>captured some stats on my hash keys. I found that most of them changed 28-36 >>>>>bits (within 4) with a few outliers as far as 13 bits from "perfection". I also >>>>>checked that I was not generating duplicate keys. How good or bad is this? >>>>>Should I work on the average, or the outliers? Any comments appreciated :) >>>>> >>>>>Pat >>>>Thanks to all for your thoughtful replies. For what it's worth, the only change >>>>I've made is to generate a key set with hamming distance 31-33, with a >>>>significant improvement in hash performance (thanks to Ricardo for providing a >>>>reasonable argument to justify the 32 bit standard). I have yet to compare this >>>>set with Rob's 16 bit criteria for XORing any 2 keys. >>>> >>>>Pat >>> >>>But 32 isn't optimal. As high as possible is optimal. >> >> >>So far the conclusions I've come up with are: >> >>(1) A Key = 0 should be excluded, since positions with a particular piece on a >>particular square will produce the same hash signature as the same position >>without that piece. >>(2) Keys with a close hamming distance to zero should be also be excluded. >>(3) Pairs of keys of the form x and ~x, since 2 such pairs will XOR to zero. >>(4) Pairs with a close hamming distance to x and ~x should also be excluded. >>(5) A key set with a hamming distance > 21 and < 43 with respect to each other >>*and* to zero should satisfy (1)-(4) above can be quickly produced within the >>program without having to read in a file with the keys. Just use the same random >>number seed. >> >>I'm trying to find sources on the net to verify these conclusions, but so far no >>luck. What do you think? > >I agree. Here's another way of looking at it--nothing really rigorous >though. Numbers that flip too many bits are just as bad as numbers that >flip too few. Two such numbers (that flip lots of bits) XOR'ed to the >hashcode will be equivalent to XOR'ing a number that flips only a few bits >since the 2nd number will tend to undo what the first did. Which is why I was aiming for 32 per key, neither too much nor too little. It's definitely an improvement over what I had. > >You really want each number to scramble the hashcode as much as possible so >that positions that are "near" each other get very different hashcodes. >(The idea being that all the positions under consideration during a search >will sort of be near each other (relatively speaking). If we insure they >map to very different hashcodes, then we will tend to avoid false matches >despite the tremendous oversubscription to each hashcode.) > >If you have two numbers that flip lots of bits (have lots of ones), then >they'll have lots of bits in common and have a small hamming distance. If >two numbers are very different (approaching x and ~x) then their XOR will >have lots of ones. So the effect of using numbers with too large a hamming >distance is nearly the same as numbers with too small a hamming distance. > >My intuition tells me to split the difference and go with a hamming distance >of 32 as optimum. Getting a set of 800 numbers all with a hamming distance >of 32 from each other is likely (mathematically) impossible. And anyway, >their pairwise combination won't have that characteristic... Actually, I believe it will. If you look at it bit by bit, w/ 32 hamming dist, P(0)=P(1)=0.5 P(A&B)=0.25 And if you look at xor truth table, 2 ways to get 0 and 1, for P=0.25+0.25=0.5 So the 32 hamming dist is, in some sense, stable. Whether this is good, bad, or indifferent... > >I guess what you really need is a set of numbers that are as "linearly >independent" as possible so that it takes the combination of many of them >to reproduce one of them. If you only had 64 different features to >encode, you could have each of these numbers be completely independent >of the other (not expressible as an XOR combination of the others) and >each resultant hashcode would map one-to-one with the positions so >encoded. But we have 800 possible features and don't want to use 800 >bits. > >This error code (BCH) idea sounds interesting. Maybe someone will >generate a nice optimal set of 800+ numbers that we all could use :). > >-Dan.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.