Computer Chess Club Archives




Subject: Re: Hash Collisions

Author: Dan Newman

Date: 01:12:42 02/23/99

Go up one level in this thread

On February 22, 1999 at 14:12:58, Robert Hyatt wrote:

>On February 22, 1999 at 11:14:14, Robert Hyatt wrote:
>>On February 22, 1999 at 02:30:37, Don Dailey wrote:
>>>On February 21, 1999 at 15:58:16, Robert Hyatt wrote:
>>>>On February 21, 1999 at 13:23:24, William Bryant wrote:
>>>>>On February 21, 1999 at 10:21:50, Robert Hyatt wrote:
>>>>>>On February 21, 1999 at 01:14:37, KarinsDad wrote:
>>>>>>>I posted this the other day, but it may have been obscured (or not if nobody has
>>>>>>>any information on it).
>>>>>>>I have been wondering if changing the Zobrist hash from a set of random number
>>>>>>>to a set of non-random and very specific numbers could result in a more even
>>>>>>>distribution in the hash table. Has anyone done any work in this area?
>>>>>>Burt Wendroff and Tony Warnock published a paper in the JICCA a few years
>>>>>>ago discussing this.  The main issue is to find the hamming distances between
>>>>>>_all_ the random numbers, and maximize all of them.  Random numbers work fine
>>>>>>if they have been tested (I ran all of mine thru a hamming distance analysis
>>>>>>when I first started).
>>>>>>As far as 'distribution' this is done by the low order N bits
>>>>>>(N=log2(hash_size)) of the hash signature.  You _could_ do the hamming analysis
>>>>>>on just those bits as well.
>>>>>I check all my random numbers for uniqueness, ie no duplicates,and use
>>>>>0xFFFFFFFFFFFFFFFF for the side to move (compliment the side to move),
>>>>>but would appreciate information on 'hamming distance analysis' in not to
>>>>>complicated to post or send via email.
>>>>>Thank you,
>>>>"hamming distance" is simply the number of bits two numbers differ in.  To
>>>>compute the hamming distance for a and b, you just compute popcnt(a^b).  Two
>>>>numbers can differ in all bits.  But more than two can not...  I shoot for
>>>>no worse than 32 bits that are the same in two random numbers...  Assuming
>>>>64 bit values...
>>>Are you sure NO two are closer (in hamming distance) than 32 bits?  How
>>>many numbers are in your table?    Are you talking about the worst
>>>case or is it some kind of average?
>>I don't do this in Crafty.  I do (did) in Cray Blitz, because the trees we
>>searched were far larger.  And what I did there was to generate the random
>>numbers separately, and each new number had to pass the test against all the
>>previous numbers before it was accepted. And yes, this was time-consuming so
>>we made these numbers 'constant'.  And no, I am not sure that all the numbers
>>were separated by a hamming distance of 32...  that goes back in time far enough
>>to make it very fuzzy.
>>However, I don't have that many numbers (1024 you said?) as I only used 12*64
>>since we didn't use any sort of 'boundary squares' in cray blitz.  Ditto for
>>Crafty where I also use 12*64.  I have been meaning to go grab that table of
>>numbers and 'steal' it for crafty, but I haven't yet, because it is in the
>>syntax of 'fortran'.
>>>I have a little piece of code I wrote that generates 64 bit random
>>>numbers one at a time and tests each one against all the ones
>>>previously generated.  If the new number is closer than my specified
>>>hamming distance, I regenerate that number until it is.   To get
>>>even 24 bit distances between any two I've had to generate almost
>>>half a million numbers.   I have 1024 numbers in my table.
>>>I don't believe your random number generator is returning numbers
>>>this good.  Maybe you can precompute them this good, I don't know.
>>>I'm trying right now with 32 bits but it's going awfully slow.
>>I use the Numerical Recipes RNG code.  But you are right, it won't produce
>>such good hamming distances quickly.  I wouldn't be surprised if it takes
>>4 billion numbers to get decent random numbers...
>I ran some quick tests... to produce a minimum distance of 16, producing
>768 random numbers (64 bits), takes 769 numbers from my Random64() function
>in crafty.  To go to 24 jumps this to a really big number (it took my code
>200 million random numbers to find 768 with a hamming distance >= 24 between
>_all_ of them.
>One minor trick is that in general, if N is a good number, ~N is also good.

I wonder if this is the case?  Say you have a bunch of numbers A, B, C, ...
and also ~A, ~B, ~C, ...,  then A ^ ~A == B ^ ~B == C ^ ~C == ~0.  It seems
like a less than desirable result to get the same bit pattern from pairwise
combinations of these numbers: you are more likely to get the same hashcode
for different positions.

What you really want to do is select a bunch of numbers that have as large
a degree of "linear" independence as you can manage.  You get maximal
independence when each number has a different bit turned on:


but of course that leaves you with too few numbers.  So you must have
more bits turned on.  That means that it will be possible to express some
members of this set of numbers as a "linear" combination of other

        X = A ^ C ^ F ^ G ^ ...

You really want to make that as difficult as possible.  If members of the
set of numbers have too many bits in common (close hamming distance), then
you will have trouble -- it becomes "easier" to get the same hashcode from
different combinations of numbers.  But the same is true if the set contains
members that have too few bits in common (as when you include complements).
My intuition tells me that trying for a hamming distance of 32 is ideal --
but I don't really know...  (I use a sieve that keeps the hamming distance
between 16 and 48 for each pairing.  It gets very slow near the end...)

You could also look at this as an exercise in bit flipping.  If your numbers
flip too many bits or too few when they are XOR'ed into the hashcode then
their effects may be too easily reversed by XOR'ing in other numbers.
My intuition is that what you really need is a bunch of 64 bit numbers with
exactly 32 bits turned on, all of which have a decent but not too large
hamming distance between them.

I find it interesting that the Zobrist trick works at all.  Given the number
of possible chess positions versus the number of 64-bit bit patterns, one
can see that that space is vastly over subscribed: if there are 10**40 chess
positions and 2**64 == 10**9 hashcodes, that means that there are approximately
10**31 positions for each hashcode on average.  I think what makes it work
is that during a search we may only hit 10**6 to 10**8 different but closely
related positions, and those random bit patterns tend to make even closely
related positions have vastly different hashcodes.  If you were to choose
positions at random from the full set of possible positions, you would expect
to find numerous hashing errors in very short order.  And yet this is an
exceedingly rare event when doing a tree search.


>I've been trying to find my notes on CB... but now think that maybe '32' was
>an 'average' number rather than the abs minimum for all the numbers.
>If I had some time, I'd try to study just what the 'optimal' set would be,
>because it must be computable...
>>I'll try to figure out a way to post my values here if you are interested.
>>64 * 12 isn't horribly big, but when you display them as 64 bit values, they
>>are big. The main issue is how to initialize such an array in a portable
>>way...  IE (BITBOARD ran[12][64] = {{1,2,3},{4,5,6}};  The numbers 1,2,3
>>default to 'int' which is bad.  I don't know what to do for all compilers
>>(IE 123ULL or (BITBOARD) 123)
>>>So my code is either wrong or I'm not understanding your method.
>>>Can you either clarify or send me your table to analyze?
>>>- Don
>>>- Don

This page took 0.02 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.