Computer Chess Club Archives




Subject: Re: Hash Collisions

Author: Robert Hyatt

Date: 08:07:12 02/23/99

Go up one level in this thread

On February 23, 1999 at 05:59:24, Rémi Coulom wrote:

>On February 23, 1999 at 04:12:42, Dan Newman wrote:
>>On February 22, 1999 at 14:12:58, Robert Hyatt wrote:
>>>On February 22, 1999 at 11:14:14, Robert Hyatt wrote:
>>>I ran some quick tests... to produce a minimum distance of 16, producing
>>>768 random numbers (64 bits), takes 769 numbers from my Random64() function
>>>in crafty.  To go to 24 jumps this to a really big number (it took my code
>>>200 million random numbers to find 768 with a hamming distance >= 24 between
>>>_all_ of them.
>>>One minor trick is that in general, if N is a good number, ~N is also good.
>>I wonder if this is the case?  Say you have a bunch of numbers A, B, C, ...
>>and also ~A, ~B, ~C, ...,  then A ^ ~A == B ^ ~B == C ^ ~C == ~0.  It seems
>>like a less than desirable result to get the same bit pattern from pairwise
>>combinations of these numbers: you are more likely to get the same hashcode
>>for different positions.
>>What you really want to do is select a bunch of numbers that have as large
>>a degree of "linear" independence as you can manage.  You get maximal
>>independence when each number has a different bit turned on:
>>    0000000000000000000000000000000000000000000000000000000000000001
>>    0000000000000000000000000000000000000000000000000000000000000010
>>    0000000000000000000000000000000000000000000000000000000000000100
>>                              ...
>>    1000000000000000000000000000000000000000000000000000000000000000
>>but of course that leaves you with too few numbers.  So you must have
>>more bits turned on.  That means that it will be possible to express some
>>members of this set of numbers as a "linear" combination of other
>>        X = A ^ C ^ F ^ G ^ ...
>>You really want to make that as difficult as possible.  If members of the
>>set of numbers have too many bits in common (close hamming distance), then
>>you will have trouble -- it becomes "easier" to get the same hashcode from
>>different combinations of numbers.  But the same is true if the set contains
>>members that have too few bits in common (as when you include complements).
>>My intuition tells me that trying for a hamming distance of 32 is ideal --
>>but I don't really know...  (I use a sieve that keeps the hamming distance
>>between 16 and 48 for each pairing.  It gets very slow near the end...)
>I totally agree with Dan here. I also read about maximizing the minimun Hamming
>distance between hash keys, but it does not sound like a good idea to me. I did
>not read the ICCA Journal article mentionned earlier, but JC Weill talks about
>the theory of linear error-correcting codes in his PhD thesis, and using
>Bose-Chaudhuri-Hocquenghem codes for hash keys. I took a very long time trying
>to understand what they were. I did not succeed yet, but I have good hopes to
>understand them one day.
>In the theory of error control codes, maximizing the minimun Hamming distance
>between hash keys is a way to maximize the error-correction capability of a
>linear code, as far as I understood it. Motivated readers can take a look at
>this course that explains the theory and practice of error control codes in
>details (especially the "Linear Block Codes" chapter):
>But I think this not what we really want for hash keys. For instance, these 3
>hash keys have a very high minimum Hamming distance between them, but would not
>be good because K2 = K0 ^ K1:
>K0 : 0101010101010101
>K1 : 1010101010101010
>K2 : 1111111111111111
>The idea of taking the binary complement of some codes sounds really bad to me.
>It means that if you have K0 = ~K1 and K2 = ~K3 the K0 ^ K1 = K2 ^ K3, as Dan
>BCH codes look better since, according to what is written in JC Weill's PhD
>thesis, they protect more bits than random codes. This means that you have to
>XOR a larger set of them to get 0. I think this is equivalent to Dan's idea of
>"making the possibility to express some members of the set as linear
>combinations of others as difficult as possible".
>Another point which I think is important to take into consideration when
>generating a good set of hash keys, is that what we want is that positions that
>are close to each other (in "chess distance") should have different hash codes.
>This means that getting 0 by XORing a set of hash codes that contains 'White
>King in e1' with 'White King in g1' is a bigger problem than when XORing 'White
>King in h1' with 'White King in a8' and even bigger than XORing 'White King in
>X' with 'White King in Y' and 'White King in Z', since this last case never
>happens in the difference between any two legal chess positions (there is only
>one White King on the board !).
>An idea I had to generate a good set of hash codes is to generate two
>independant sets of 32 bit hash codes, each of them would be optimized by
>minimizing the number of measured hash collisions in searches on representative
>positions. The two best 32 bit sets would be merged to create the final 64 bit
>keys. I fear that this idea is not practical since it would take a lot of
>computing power.
>I'd like to have the Hamming distance maximizers' opinion about these points.

First, I don't do _any_ of the above in Crafty at present.  So it is a pretty
much moot point at present for me.

second, the ran[n+1]=~ran[n] was just a simple optimization for the test
program.  And even there it was just a quick way to find 768 random numbers.
(ie that is quicker to compute than a new random number).  I only wanted to
see how many different 64 bit values I would have to try to get a hamming
distance of 24, because Don's number was way smaller than mine.

This page took 0.02 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.