Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hash codes - how good is good enough?

Author: Dan Newman

Date: 23:34:35 02/09/01

Go up one level in this thread


On February 09, 2001 at 19:34:03, Ricardo Gibert wrote:

>On February 09, 2001 at 19:04:45, David Rasmussen wrote:
>
>>On February 09, 2001 at 10:39:09, Pat King wrote:
>>
>>>On February 07, 2001 at 10:59:31, Pat King wrote:
>>>
>>>>I have seen it written here that with 64 bit Zobrist hashing, the perfect key
>>>>should change 32 bits. When I had what I thought to be hashing problems, I
>>>>captured some stats on my hash keys. I found that most of them changed 28-36
>>>>bits (within 4) with a few outliers as far as 13 bits from "perfection". I also
>>>>checked that I was not generating duplicate keys. How good or bad is this?
>>>>Should I work on the average, or the outliers? Any comments appreciated :)
>>>>
>>>>Pat
>>>Thanks to all for your thoughtful replies. For what it's worth, the only change
>>>I've made is to generate a key set with hamming distance 31-33, with a
>>>significant improvement in hash performance (thanks to Ricardo for providing a
>>>reasonable argument to justify the 32 bit standard). I have yet to compare this
>>>set with Rob's 16 bit criteria for XORing any 2 keys.
>>>
>>>Pat
>>
>>But 32 isn't optimal. As high as possible is optimal.
>
>
>So far the conclusions I've come up with are:
>
>(1) A Key = 0 should be excluded, since positions with a particular piece on a
>particular square will produce the same hash signature as the same position
>without that piece.
>(2) Keys with a close hamming distance to zero should be also be excluded.
>(3) Pairs of keys of the form x and ~x, since 2 such pairs will XOR to zero.
>(4) Pairs with a close hamming distance to x and ~x should also be excluded.
>(5) A key set with a hamming distance > 21 and < 43 with respect to each other
>*and* to zero should satisfy (1)-(4) above can be quickly produced within the
>program without having to read in a file with the keys. Just use the same random
>number seed.
>
>I'm trying to find sources on the net to verify these conclusions, but so far no
>luck. What do you think?

I agree.  Here's another way of looking at it--nothing really rigorous
though.  Numbers that flip too many bits are just as bad as numbers that
flip too few.  Two such numbers (that flip lots of bits) XOR'ed to the
hashcode will be equivalent to XOR'ing a number that flips only a few bits
since the 2nd number will tend to undo what the first did.

You really want each number to scramble the hashcode as much as possible so
that positions that are "near" each other get very different hashcodes.
(The idea being that all the positions under consideration during a search
will sort of be near each other (relatively speaking).  If we insure they
map to very different hashcodes, then we will tend to avoid false matches
despite the tremendous oversubscription to each hashcode.)

If you have two numbers that flip lots of bits (have lots of ones), then
they'll have lots of bits in common and have a small hamming distance.  If
two numbers are very different (approaching x and ~x) then their XOR will
have lots of ones.  So the effect of using numbers with too large a hamming
distance is nearly the same as numbers with too small a hamming distance.

My intuition tells me to split the difference and go with a hamming distance
of 32 as optimum.  Getting a set of 800 numbers all with a hamming distance
of 32 from each other is likely (mathematically) impossible.  And anyway,
their pairwise combination won't have that characteristic...

I guess what you really need is a set of numbers that are as "linearly
independent" as possible so that it takes the combination of many of them
to reproduce one of them.  If you only had 64 different features to
encode, you could have each of these numbers be completely independent
of the other (not expressible as an XOR combination of the others) and
each resultant hashcode would map one-to-one with the positions so
encoded.  But we have 800 possible features and don't want to use 800
bits.

This error code (BCH) idea sounds interesting.  Maybe someone will
generate a nice optimal set of 800+ numbers that we all could use :).

-Dan.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.