Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hash Collisions

Author: Andrew Dados

Date: 10:28:56 04/07/01

Go up one level in this thread


On April 06, 2001 at 23:59:16, Robert Hyatt wrote:

>On April 06, 2001 at 11:57:23, David Rasmussen wrote:
>
>>On April 06, 2001 at 10:43:23, Robert Hyatt wrote:
>>
>>>>
>>>>I tested the numbers you use in Crafty, and the minimum hamming distance between
>>>>any pair of numbers you generate, is 14. That's even worse that 16. Just for
>>>>your information, only one pair of the numbers you generate, have this low
>>>>distance. It is the 185th number and the 394th number you generate. That is, the
>>>>keys for w_bishop_random[15] and b_queen_random[32] have a hamming distance of
>>>>14.
>>>>
>>>>Am I missing something here or are you? :)
>>>
>>>
>>>Actually I don't know.  My first test was to do some trillion node searches
>>>and measure _actual_ collisions.  I stored the actual hash key (64 bits) as I
>>>do now, plus the real board position (32 bytes, 4 bits for each square).  And
>>>each time I got a hash hit, I did a real position match as well.  For a test
>>>that searched about 1 trillion nodes per move, over a part of a full game,
>>>I get _zero_ collisions.
>>>
>>
>>So do I, after I corrected my search bug, as you can read elsewhere.
>>
>>>As far as the minimum hamming distance goes, I'm much less worried about that
>>>so long as it is > 8.  I am more interested in the _average_ hamming distance.
>>>
>>
>>Why? The minimum distance pairs are the worst cases, but they are still there.
>>
>>>I once modified the code to require > 24.  It took quite a while to compute
>>>the  needed values with that constraint, but it finished.  I could find no
>>>improvement in hashing collisions however, so I tossed that test away.
>>>
>>
>>I don't collisions either now, so it is very hard to test the effect of
>>different sets of numbers, as none of the sets give collisions within even long
>>amounts of time. This hints that it might not be worth the effort to generate
>>particularly good numbers at all. But it is still theoretically interesting, I
>>think.
>>
>>>Another important thing you probably are not testing is "sensibility".  IE
>>>no need to test the two random values for hamming distance if the two values
>>>represent squares that can not be moved to (IE a bishop on light squares can't
>>>ever use the other half of the board, nor those random values.  The more likely
>>>two squares can be hit in one move, the more important that hamming distance is.
>>>
>>
>>Of course.
>>
>>>I also don't really care about tha hamming distance between a bishop and a
>>>knight, since this hashing scheme is all about drastically changing the key
>>>by moving a single piece.  That is probably why I didn't see a 14 distance,
>>>since I didn't check the numbers like that.  Probably about the best you can
>>>do overall is a distance of 32 - log2(#values).  That is assuming you compare
>>>_every_ number against every other number, something I didn't try to do when I
>>>was testing.
>>
>>This is a question for the domain of coding theory. As I've posted elsewhere,
>>the non-linear 64-bit code with the largest dimension that we know of today, is
>>an extended BCH code.
>>
>>It is entirely possible to generate 840 numbers with minimum hamming distance 23
>>so 32 - log2(#values) is not a bound.
>
>
>I don't know why I wrote 32 -...  I meant 64 -, obviously, as you can do
>2 numbers with a hamming distance of 64.  4 with 63, etc...

How is it possible to find 4 - 64 bit numbers with hamming distance of 63?
I think only 2 numbers are possible.

-Andrew-

>
>However, I thought your original premise was that you _were_ getting collisions
>based on the bad hash move test.  There are two explanations.  (1) you are
>storing a bogus move, or else you are somehow clobbering the move part of a
>hash entry without knowing it;  or (1) your hashing is broken and you _are_
>getting real collisions, which suggests that your RNG is not doing a reasonable
>job.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.