Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: About debugging HashTables implementation .

Author: Robert Hyatt

Date: 14:24:43 04/12/02

Go up one level in this thread


On April 12, 2002 at 15:44:46, Will Singleton wrote:

>On April 12, 2002 at 10:43:42, Pat King wrote:
>
>>On April 10, 2002 at 20:47:06, Will Singleton wrote:
>>
>>>On April 10, 2002 at 17:31:20, Andrew Williams wrote:
>>>
>>>>On April 10, 2002 at 15:28:26, Pat King wrote:
>>>>
>>>>>On April 09, 2002 at 12:11:08, Andrew Williams wrote:
>>>>>
>>>>>>On April 09, 2002 at 11:56:45, Pat King wrote:
>>>>>>
>>>>>>>On April 08, 2002 at 15:17:46, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>How do you define "collision"?  Normally this means two different board
>>>>>>>>positions produce the same hash signature.  If you are getting _those_ you
>>>>>>>>have a Zobrist hashing bug.
>>>>>>>
>>>>>>>How do you detect collisions, then? Record the entire position in the hash
>>>>>>>table? Even for a debug build, that seems like it would be impractical. >>Although undetected hash bugs are even less practical :)
>>>>>>>
>>>>>>
>>>>>>You check to see if the move stored in the hash entry is legal or not. If
>>>>>>the hash keys match but the move isn't legal in the position, it means that
>>>>>>you have a hash collision.
>>>>>>
>>>>>>Andrew
>>>>>
>>>>>But you rely on the hashed positions being different enough for that move to be
>>>>>illegal. Certainly, your technique detects collisions, but not all collisions!?
>>>>>
>>>>>Pat
>>>>
>>>>Yes. I should have been clearer. I don't have any code in PM to detect
>>>>collisions. As you say, what I described wouldn't catch all collisions.
>>>>
>>>>Andrew
>>>
>>>As always seem to be the case in computer chess, there are competing definitions
>>>for the same terms (collisions of collisions, if you will).
>>>
>>>In the Dec 1996 issue of ICCAJ, Beal and Smith discuss collisions in their
>>>article on multiple-probe hashing.  Your definition of a collision, if I read
>>>you correctly, is what Beal & Smith would call a false-match error (negligible,
>>>in their opinion).  Collisions, otoh, they say occur when two positions gen the
>>>same hash >index<, but different keys.  This, they say, can occur frequently,
>>>and requires a replacement scheme to determine which is kept and which
>>>discarded.
>>
>>The definition of collision being used in this thread was clearly spelled out by
>>Bob, and lurks above in all the re:s above.
>>It is especially relevant to this thread, where we humble beginners (please
>>don't notice how long I've been in that category) struggle to get even the
>>basics right.
>
>
>What Bob Said was this:
>"How do you define "collision"?  Normally this means two different board
>positions produce the same hash signature.  If you are getting _those_ you
>have a Zobrist hashing bug.  If you mean two different signatures hash to the
>same board position then perhaps your 64 bit random numbers are very poorly
>chosen..."
>
>I think what he meant by the second sentence was: if two different keys hash to
>the same spot in the table, then your numbers aren't random enough.
>
>This would follow the definition of collision that has been used in the articles
>I have read.

That is sometimes called a "collision" but it is incorrectly done.  A collision
is technically "two unique positions that produce the same hash signature"  In
normal hashing (not chess) you take something (say a social security number)
and hash it to something used to address a table for random access.  In chess,
we hash a position to a signature, then we hash the signature (usually by
ANDing with a mask) to reduce it to a table index.

This second kind of collision (two hash signatures yield the same table index
when ANDed with some mask) must be handled as you mention.  It boils down to
an effective replacement policy.  The first kind of hashing collision can not
be tolerated, and with 64 bit keys should not happen if the Zobrist random
numbers are chosen very well...



> But it seems clear, to me, that from my testing, collisions are
>common and must be managed by an appropriate replacement scheme.  So I don't
>know why Bob and others seem to adhere to the idea that collisions only occur
>rarely, or if you have a bug.
>

Mainly because when we use the term "collision" we mean two unique chess
positions that produce the same hash signature.  And that is both a problem
that will cause trouble, and based on lots of 64 bit testing, is probably also
a bug.





>If a collision is two different board positions producing the same full 64-bit
>hash key, then what do you call the event in which two different board positions
>produce different keys, but map to the same hash slot?


As I initially said, in "normal" hashing, the "signature" step is not used
directly.  You hash something to an index and go there.  We are doing it in
two steps and _either_ can be called a collision.  However, in my case, I
don't call two different signatures yielding the same table index a collision,
I call that something that just happens naturally.  It isn't a bug at all
assuming the table index distribution seems to be pretty uniform.

One common error I used to see was people building zobrist random numbers by
jamming two 32 bit floats together.  Unfortunately, that didn't produce a 64
bit random number because 9 of the bits (sign + exponent) are not very random.
That led to "clustering of indices" with many "gaps".  Ideally you'd like your
hash signatures to distribute addresses all over the table in a very uniform
way.  And generally good Zobrist numbers will do this effectively...




>
>>
>>>
>>>Their definition implies than collisions only occur during hash store, but it
>>>seems correct that they can also occur during probes.  That is, you probe based
>>>on the index, but find a different key stored there.
>>>
>>>So, it appears that collisions can occur during probe and store, and are reduced
>>>by multiple entries per index (if you count only failures after exhausting the
>>>multiple probes/stores).  Reductions also are made with increasing the hash
>>>size, and possibly the hamming distance in the random number list (though this
>>>isn't clear, as I get better results with maximizing the hamming distance
>>>between the indexes rather than the full number).
>>>
>>>I seem to get a lot of collisions, but I don't really know what a lot of
>>>collisons are, since people count things (and define things) differently.  What
>>>we need is a uniform definition for every chess term.  This would help beginning
>>>and experienced programmers alike.
>>>
>>>Will



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.