Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Hash codes - how good is good enough?

Author: Dan Newman
Date: 02:58:17 02/11/01
On February 10, 2001 at 10:13:50, Pat King wrote:

>On February 10, 2001 at 02:34:35, Dan Newman wrote:
>
>>On February 09, 2001 at 19:34:03, Ricardo Gibert wrote:
>>
>>>On February 09, 2001 at 19:04:45, David Rasmussen wrote:
>>>
>>>>On February 09, 2001 at 10:39:09, Pat King wrote:
>>>>
>>>>>On February 07, 2001 at 10:59:31, Pat King wrote:
>>>>>
>>>>>>I have seen it written here that with 64 bit Zobrist hashing, the perfect key
>>>>>>should change 32 bits. When I had what I thought to be hashing problems, I
>>>>>>captured some stats on my hash keys. I found that most of them changed 28-36
>>>>>>bits (within 4) with a few outliers as far as 13 bits from "perfection". I also
>>>>>>checked that I was not generating duplicate keys. How good or bad is this?
>>>>>>Should I work on the average, or the outliers? Any comments appreciated :)
>>>>>>
>>>>>>Pat
>>>>>Thanks to all for your thoughtful replies. For what it's worth, the only change
>>>>>I've made is to generate a key set with hamming distance 31-33, with a
>>>>>significant improvement in hash performance (thanks to Ricardo for providing a
>>>>>reasonable argument to justify the 32 bit standard). I have yet to compare this
>>>>>set with Rob's 16 bit criteria for XORing any 2 keys.
>>>>>
>>>>>Pat
>>>>
>>>>But 32 isn't optimal. As high as possible is optimal.
>>>
>>>
>>>So far the conclusions I've come up with are:
>>>
>>>(1) A Key = 0 should be excluded, since positions with a particular piece on a
>>>particular square will produce the same hash signature as the same position
>>>without that piece.
>>>(2) Keys with a close hamming distance to zero should be also be excluded.
>>>(3) Pairs of keys of the form x and ~x, since 2 such pairs will XOR to zero.
>>>(4) Pairs with a close hamming distance to x and ~x should also be excluded.
>>>(5) A key set with a hamming distance > 21 and < 43 with respect to each other
>>>*and* to zero should satisfy (1)-(4) above can be quickly produced within the
>>>program without having to read in a file with the keys. Just use the same random
>>>number seed.
>>>
>>>I'm trying to find sources on the net to verify these conclusions, but so far no
>>>luck. What do you think?
>>
>>I agree.  Here's another way of looking at it--nothing really rigorous
>>though.  Numbers that flip too many bits are just as bad as numbers that
>>flip too few.  Two such numbers (that flip lots of bits) XOR'ed to the
>>hashcode will be equivalent to XOR'ing a number that flips only a few bits
>>since the 2nd number will tend to undo what the first did.
>Which is why I was aiming for 32 per key, neither too much nor too little. It's
>definitely an improvement over what I had.
>>
>>You really want each number to scramble the hashcode as much as possible so
>>that positions that are "near" each other get very different hashcodes.
>>(The idea being that all the positions under consideration during a search
>>will sort of be near each other (relatively speaking).  If we insure they
>>map to very different hashcodes, then we will tend to avoid false matches
>>despite the tremendous oversubscription to each hashcode.)
>>
>>If you have two numbers that flip lots of bits (have lots of ones), then
>>they'll have lots of bits in common and have a small hamming distance.  If
>>two numbers are very different (approaching x and ~x) then their XOR will
>>have lots of ones.  So the effect of using numbers with too large a hamming
>>distance is nearly the same as numbers with too small a hamming distance.
>>
>>My intuition tells me to split the difference and go with a hamming distance
>>of 32 as optimum.  Getting a set of 800 numbers all with a hamming distance
>>of 32 from each other is likely (mathematically) impossible.  And anyway,
>>their pairwise combination won't have that characteristic...
>Actually, I believe it will. If you look at it bit by bit, w/ 32 hamming dist,
>P(0)=P(1)=0.5
>P(A&B)=0.25
>And if you look at xor truth table, 2 ways to get 0 and 1, for P=0.25+0.25=0.5
>So the 32 hamming dist is, in some sense, stable. Whether this is good, bad, or
>indifferent...


Well, I tried this out by hand, and at first it seemed to confirm this.  I
was able to construct a set of codes that were equidistant from each other,
and sure enough, when XOR'ed pairwise those produced were also at the same
hamming distance.  The set that I constructed all had equal numbers of 0's
and 1's.

OTOH, if I take two numbers with unequal numbers of 0's and 1's, say
A=011111111 and B=01110000 (8-bit for convenience), which have a hamming
distance of 4, then A^B=00001111, and that has a hamming distance of 3
from A and 7 from B.  Of course I might have some trouble making a set of
more than two such numbers...

Anyway, a set of 64-bit numbers with a mutual hamming distance of exactly
32 which only gives rise to other numbers with that same distance is
actually a very bad set: you can't reach all the different possible 64-bit
bit patterns by XOR'ing them together.  You need a set that spans the
space of hashcodes.

Not really sure what all this means...

-Dan.


>>
>>I guess what you really need is a set of numbers that are as "linearly
>>independent" as possible so that it takes the combination of many of them
>>to reproduce one of them.  If you only had 64 different features to
>>encode, you could have each of these numbers be completely independent
>>of the other (not expressible as an XOR combination of the others) and
>>each resultant hashcode would map one-to-one with the positions so
>>encoded.  But we have 800 possible features and don't want to use 800
>>bits.
>>
>>This error code (BCH) idea sounds interesting.  Maybe someone will
>>generate a nice optimal set of 800+ numbers that we all could use :).
>>
>>-Dan.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.