Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hash codes - how good is good enough?

Author: Pat King

Date: 07:13:50 02/10/01

Go up one level in this thread


On February 10, 2001 at 02:34:35, Dan Newman wrote:

>On February 09, 2001 at 19:34:03, Ricardo Gibert wrote:
>
>>On February 09, 2001 at 19:04:45, David Rasmussen wrote:
>>
>>>On February 09, 2001 at 10:39:09, Pat King wrote:
>>>
>>>>On February 07, 2001 at 10:59:31, Pat King wrote:
>>>>
>>>>>I have seen it written here that with 64 bit Zobrist hashing, the perfect key
>>>>>should change 32 bits. When I had what I thought to be hashing problems, I
>>>>>captured some stats on my hash keys. I found that most of them changed 28-36
>>>>>bits (within 4) with a few outliers as far as 13 bits from "perfection". I also
>>>>>checked that I was not generating duplicate keys. How good or bad is this?
>>>>>Should I work on the average, or the outliers? Any comments appreciated :)
>>>>>
>>>>>Pat
>>>>Thanks to all for your thoughtful replies. For what it's worth, the only change
>>>>I've made is to generate a key set with hamming distance 31-33, with a
>>>>significant improvement in hash performance (thanks to Ricardo for providing a
>>>>reasonable argument to justify the 32 bit standard). I have yet to compare this
>>>>set with Rob's 16 bit criteria for XORing any 2 keys.
>>>>
>>>>Pat
>>>
>>>But 32 isn't optimal. As high as possible is optimal.
>>
>>
>>So far the conclusions I've come up with are:
>>
>>(1) A Key = 0 should be excluded, since positions with a particular piece on a
>>particular square will produce the same hash signature as the same position
>>without that piece.
>>(2) Keys with a close hamming distance to zero should be also be excluded.
>>(3) Pairs of keys of the form x and ~x, since 2 such pairs will XOR to zero.
>>(4) Pairs with a close hamming distance to x and ~x should also be excluded.
>>(5) A key set with a hamming distance > 21 and < 43 with respect to each other
>>*and* to zero should satisfy (1)-(4) above can be quickly produced within the
>>program without having to read in a file with the keys. Just use the same random
>>number seed.
>>
>>I'm trying to find sources on the net to verify these conclusions, but so far no
>>luck. What do you think?
>
>I agree.  Here's another way of looking at it--nothing really rigorous
>though.  Numbers that flip too many bits are just as bad as numbers that
>flip too few.  Two such numbers (that flip lots of bits) XOR'ed to the
>hashcode will be equivalent to XOR'ing a number that flips only a few bits
>since the 2nd number will tend to undo what the first did.
Which is why I was aiming for 32 per key, neither too much nor too little. It's
definitely an improvement over what I had.
>
>You really want each number to scramble the hashcode as much as possible so
>that positions that are "near" each other get very different hashcodes.
>(The idea being that all the positions under consideration during a search
>will sort of be near each other (relatively speaking).  If we insure they
>map to very different hashcodes, then we will tend to avoid false matches
>despite the tremendous oversubscription to each hashcode.)
>
>If you have two numbers that flip lots of bits (have lots of ones), then
>they'll have lots of bits in common and have a small hamming distance.  If
>two numbers are very different (approaching x and ~x) then their XOR will
>have lots of ones.  So the effect of using numbers with too large a hamming
>distance is nearly the same as numbers with too small a hamming distance.
>
>My intuition tells me to split the difference and go with a hamming distance
>of 32 as optimum.  Getting a set of 800 numbers all with a hamming distance
>of 32 from each other is likely (mathematically) impossible.  And anyway,
>their pairwise combination won't have that characteristic...
Actually, I believe it will. If you look at it bit by bit, w/ 32 hamming dist,
P(0)=P(1)=0.5
P(A&B)=0.25
And if you look at xor truth table, 2 ways to get 0 and 1, for P=0.25+0.25=0.5
So the 32 hamming dist is, in some sense, stable. Whether this is good, bad, or
indifferent...
>
>I guess what you really need is a set of numbers that are as "linearly
>independent" as possible so that it takes the combination of many of them
>to reproduce one of them.  If you only had 64 different features to
>encode, you could have each of these numbers be completely independent
>of the other (not expressible as an XOR combination of the others) and
>each resultant hashcode would map one-to-one with the positions so
>encoded.  But we have 800 possible features and don't want to use 800
>bits.
>
>This error code (BCH) idea sounds interesting.  Maybe someone will
>generate a nice optimal set of 800+ numbers that we all could use :).
>
>-Dan.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.