Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hash Collisions

Author: Robert Hyatt

Date: 20:59:16 04/06/01

Go up one level in this thread


On April 06, 2001 at 11:57:23, David Rasmussen wrote:

>On April 06, 2001 at 10:43:23, Robert Hyatt wrote:
>
>>>
>>>I tested the numbers you use in Crafty, and the minimum hamming distance between
>>>any pair of numbers you generate, is 14. That's even worse that 16. Just for
>>>your information, only one pair of the numbers you generate, have this low
>>>distance. It is the 185th number and the 394th number you generate. That is, the
>>>keys for w_bishop_random[15] and b_queen_random[32] have a hamming distance of
>>>14.
>>>
>>>Am I missing something here or are you? :)
>>
>>
>>Actually I don't know.  My first test was to do some trillion node searches
>>and measure _actual_ collisions.  I stored the actual hash key (64 bits) as I
>>do now, plus the real board position (32 bytes, 4 bits for each square).  And
>>each time I got a hash hit, I did a real position match as well.  For a test
>>that searched about 1 trillion nodes per move, over a part of a full game,
>>I get _zero_ collisions.
>>
>
>So do I, after I corrected my search bug, as you can read elsewhere.
>
>>As far as the minimum hamming distance goes, I'm much less worried about that
>>so long as it is > 8.  I am more interested in the _average_ hamming distance.
>>
>
>Why? The minimum distance pairs are the worst cases, but they are still there.
>
>>I once modified the code to require > 24.  It took quite a while to compute
>>the  needed values with that constraint, but it finished.  I could find no
>>improvement in hashing collisions however, so I tossed that test away.
>>
>
>I don't collisions either now, so it is very hard to test the effect of
>different sets of numbers, as none of the sets give collisions within even long
>amounts of time. This hints that it might not be worth the effort to generate
>particularly good numbers at all. But it is still theoretically interesting, I
>think.
>
>>Another important thing you probably are not testing is "sensibility".  IE
>>no need to test the two random values for hamming distance if the two values
>>represent squares that can not be moved to (IE a bishop on light squares can't
>>ever use the other half of the board, nor those random values.  The more likely
>>two squares can be hit in one move, the more important that hamming distance is.
>>
>
>Of course.
>
>>I also don't really care about tha hamming distance between a bishop and a
>>knight, since this hashing scheme is all about drastically changing the key
>>by moving a single piece.  That is probably why I didn't see a 14 distance,
>>since I didn't check the numbers like that.  Probably about the best you can
>>do overall is a distance of 32 - log2(#values).  That is assuming you compare
>>_every_ number against every other number, something I didn't try to do when I
>>was testing.
>
>This is a question for the domain of coding theory. As I've posted elsewhere,
>the non-linear 64-bit code with the largest dimension that we know of today, is
>an extended BCH code.
>
>It is entirely possible to generate 840 numbers with minimum hamming distance 23
>so 32 - log2(#values) is not a bound.


I don't know why I wrote 32 -...  I meant 64 -, obviously, as you can do
2 numbers with a hamming distance of 64.  4 with 63, etc...

However, I thought your original premise was that you _were_ getting collisions
based on the bad hash move test.  There are two explanations.  (1) you are
storing a bogus move, or else you are somehow clobbering the move part of a
hash entry without knowing it;  or (1) your hashing is broken and you _are_
getting real collisions, which suggests that your RNG is not doing a reasonable
job.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.