Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hash Collisions

Author: Robert Hyatt

Date: 19:41:38 04/08/01

Go up one level in this thread


On April 07, 2001 at 13:28:56, Andrew Dados wrote:

>On April 06, 2001 at 23:59:16, Robert Hyatt wrote:
>
>>On April 06, 2001 at 11:57:23, David Rasmussen wrote:
>>
>>>On April 06, 2001 at 10:43:23, Robert Hyatt wrote:
>>>
>>>>>
>>>>>I tested the numbers you use in Crafty, and the minimum hamming distance between
>>>>>any pair of numbers you generate, is 14. That's even worse that 16. Just for
>>>>>your information, only one pair of the numbers you generate, have this low
>>>>>distance. It is the 185th number and the 394th number you generate. That is, the
>>>>>keys for w_bishop_random[15] and b_queen_random[32] have a hamming distance of
>>>>>14.
>>>>>
>>>>>Am I missing something here or are you? :)
>>>>
>>>>
>>>>Actually I don't know.  My first test was to do some trillion node searches
>>>>and measure _actual_ collisions.  I stored the actual hash key (64 bits) as I
>>>>do now, plus the real board position (32 bytes, 4 bits for each square).  And
>>>>each time I got a hash hit, I did a real position match as well.  For a test
>>>>that searched about 1 trillion nodes per move, over a part of a full game,
>>>>I get _zero_ collisions.
>>>>
>>>
>>>So do I, after I corrected my search bug, as you can read elsewhere.
>>>
>>>>As far as the minimum hamming distance goes, I'm much less worried about that
>>>>so long as it is > 8.  I am more interested in the _average_ hamming distance.
>>>>
>>>
>>>Why? The minimum distance pairs are the worst cases, but they are still there.
>>>
>>>>I once modified the code to require > 24.  It took quite a while to compute
>>>>the  needed values with that constraint, but it finished.  I could find no
>>>>improvement in hashing collisions however, so I tossed that test away.
>>>>
>>>
>>>I don't collisions either now, so it is very hard to test the effect of
>>>different sets of numbers, as none of the sets give collisions within even long
>>>amounts of time. This hints that it might not be worth the effort to generate
>>>particularly good numbers at all. But it is still theoretically interesting, I
>>>think.
>>>
>>>>Another important thing you probably are not testing is "sensibility".  IE
>>>>no need to test the two random values for hamming distance if the two values
>>>>represent squares that can not be moved to (IE a bishop on light squares can't
>>>>ever use the other half of the board, nor those random values.  The more likely
>>>>two squares can be hit in one move, the more important that hamming distance is.
>>>>
>>>
>>>Of course.
>>>
>>>>I also don't really care about tha hamming distance between a bishop and a
>>>>knight, since this hashing scheme is all about drastically changing the key
>>>>by moving a single piece.  That is probably why I didn't see a 14 distance,
>>>>since I didn't check the numbers like that.  Probably about the best you can
>>>>do overall is a distance of 32 - log2(#values).  That is assuming you compare
>>>>_every_ number against every other number, something I didn't try to do when I
>>>>was testing.
>>>
>>>This is a question for the domain of coding theory. As I've posted elsewhere,
>>>the non-linear 64-bit code with the largest dimension that we know of today, is
>>>an extended BCH code.
>>>
>>>It is entirely possible to generate 840 numbers with minimum hamming distance 23
>>>so 32 - log2(#values) is not a bound.
>>
>>
>>I don't know why I wrote 32 -...  I meant 64 -, obviously, as you can do
>>2 numbers with a hamming distance of 64.  4 with 63, etc...
>
>How is it possible to find 4 - 64 bit numbers with hamming distance of 63?
>I think only 2 numbers are possible.
>
>-Andrew-

I was counting the two original numbers with a distance of 64.  IE >=64,
then >=63, etc...

Bob


>
>>
>>However, I thought your original premise was that you _were_ getting collisions
>>based on the bad hash move test.  There are two explanations.  (1) you are
>>storing a bogus move, or else you are somehow clobbering the move part of a
>>hash entry without knowing it;  or (1) your hashing is broken and you _are_
>>getting real collisions, which suggests that your RNG is not doing a reasonable
>>job.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.