Author: Robert Hyatt

Date: 08:07:12 02/23/99

Go up one level in this thread

On February 23, 1999 at 05:59:24, Rémi Coulom wrote: >On February 23, 1999 at 04:12:42, Dan Newman wrote: > >>On February 22, 1999 at 14:12:58, Robert Hyatt wrote: >> >>>On February 22, 1999 at 11:14:14, Robert Hyatt wrote: >>> >[...] >>> >>>I ran some quick tests... to produce a minimum distance of 16, producing >>>768 random numbers (64 bits), takes 769 numbers from my Random64() function >>>in crafty. To go to 24 jumps this to a really big number (it took my code >>>200 million random numbers to find 768 with a hamming distance >= 24 between >>>_all_ of them. >>> >>>One minor trick is that in general, if N is a good number, ~N is also good. >>> >> >>I wonder if this is the case? Say you have a bunch of numbers A, B, C, ... >>and also ~A, ~B, ~C, ..., then A ^ ~A == B ^ ~B == C ^ ~C == ~0. It seems >>like a less than desirable result to get the same bit pattern from pairwise >>combinations of these numbers: you are more likely to get the same hashcode >>for different positions. >> >>What you really want to do is select a bunch of numbers that have as large >>a degree of "linear" independence as you can manage. You get maximal >>independence when each number has a different bit turned on: >> >> 0000000000000000000000000000000000000000000000000000000000000001 >> 0000000000000000000000000000000000000000000000000000000000000010 >> 0000000000000000000000000000000000000000000000000000000000000100 >> ... >> 1000000000000000000000000000000000000000000000000000000000000000 >> >>but of course that leaves you with too few numbers. So you must have >>more bits turned on. That means that it will be possible to express some >>members of this set of numbers as a "linear" combination of other >>members: >> >> X = A ^ C ^ F ^ G ^ ... >> >>You really want to make that as difficult as possible. If members of the >>set of numbers have too many bits in common (close hamming distance), then >>you will have trouble -- it becomes "easier" to get the same hashcode from >>different combinations of numbers. But the same is true if the set contains >>members that have too few bits in common (as when you include complements). >>My intuition tells me that trying for a hamming distance of 32 is ideal -- >>but I don't really know... (I use a sieve that keeps the hamming distance >>between 16 and 48 for each pairing. It gets very slow near the end...) >[...] > >I totally agree with Dan here. I also read about maximizing the minimun Hamming >distance between hash keys, but it does not sound like a good idea to me. I did >not read the ICCA Journal article mentionned earlier, but JC Weill talks about >the theory of linear error-correcting codes in his PhD thesis, and using >Bose-Chaudhuri-Hocquenghem codes for hash keys. I took a very long time trying >to understand what they were. I did not succeed yet, but I have good hopes to >understand them one day. > >In the theory of error control codes, maximizing the minimun Hamming distance >between hash keys is a way to maximize the error-correction capability of a >linear code, as far as I understood it. Motivated readers can take a look at >this course that explains the theory and practice of error control codes in >details (especially the "Linear Block Codes" chapter): >http://schof.colorado.edu/~ecen5682/notes.htm > >But I think this not what we really want for hash keys. For instance, these 3 >hash keys have a very high minimum Hamming distance between them, but would not >be good because K2 = K0 ^ K1: >K0 : 0101010101010101 >K1 : 1010101010101010 >K2 : 1111111111111111 > >The idea of taking the binary complement of some codes sounds really bad to me. >It means that if you have K0 = ~K1 and K2 = ~K3 the K0 ^ K1 = K2 ^ K3, as Dan >wrote. > >BCH codes look better since, according to what is written in JC Weill's PhD >thesis, they protect more bits than random codes. This means that you have to >XOR a larger set of them to get 0. I think this is equivalent to Dan's idea of >"making the possibility to express some members of the set as linear >combinations of others as difficult as possible". > >Another point which I think is important to take into consideration when >generating a good set of hash keys, is that what we want is that positions that >are close to each other (in "chess distance") should have different hash codes. >This means that getting 0 by XORing a set of hash codes that contains 'White >King in e1' with 'White King in g1' is a bigger problem than when XORing 'White >King in h1' with 'White King in a8' and even bigger than XORing 'White King in >X' with 'White King in Y' and 'White King in Z', since this last case never >happens in the difference between any two legal chess positions (there is only >one White King on the board !). > >An idea I had to generate a good set of hash codes is to generate two >independant sets of 32 bit hash codes, each of them would be optimized by >minimizing the number of measured hash collisions in searches on representative >positions. The two best 32 bit sets would be merged to create the final 64 bit >keys. I fear that this idea is not practical since it would take a lot of >computing power. > >I'd like to have the Hamming distance maximizers' opinion about these points. > >Remi First, I don't do _any_ of the above in Crafty at present. So it is a pretty much moot point at present for me. second, the ran[n+1]=~ran[n] was just a simple optimization for the test program. And even there it was just a quick way to find 768 random numbers. (ie that is quicker to compute than a new random number). I only wanted to see how many different 64 bit values I would have to try to get a hamming distance of 24, because Don's number was way smaller than mine.

- Re: Hash Collisions
**Rémi Coulom***03:29:36 02/24/99*- Re: Hash Collisions
**Robert Hyatt***14:05:48 02/24/99*

- Re: Hash Collisions

This page took 0.02 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.