Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Dumb hashing question

Author: Robert Hyatt

Date: 14:41:00 12/09/99

Go up one level in this thread


On December 09, 1999 at 15:46:54, Dann Corbit wrote:

>If I make a hash key of n bits, how can I make a smaller hash table than 2^n.
>
>I don't get it.  (e.g. My question about shrinking the hash table was met with
>"It does not matter how small the table is -- it is the size of the key that
>matters.").  This caused me to wonder why my brain is so tiny.  I mean, how can
>I have a hash table with a great big key and not allocate enough space to hold
>it? Do you have a smaller hash leader (half-key, quarter-key, whatever) and then
>throw the rest into a list of some kind?
>
>What gives?

There are lots of answers.  This is called the "collision replacement policy".

We often mix the term collision badly.  Some call it the condition that occurs
when two _different_ positions produce the same hash signature.  This is
generally incorrect.  Others call it the condition that occurs when two
different hash signatures end up needing to be stored at the same table address.

The latter is the definition used by Knuth, and is good enough to answer your
question.  If you have a 64 bit hash signature (as I do) you can take any N
bits from that signature to address a table of size 2^N.  IE if you want 4
entries in your table, use the right-most 2 bits of the 64 bit hash signature.
Which gives you a value between 0 and 3, just enough to address your table.

We now have two issues.  Assuming you use a 64 bit hash signature, what is the
probability that two different chess positions produce the same 64 bit hash
signature?  That would cause serious problems, because we store search results
and identify these results by the 64 bit signature, not the full position
description.  If two different positions have the same signature, we would use
search results in the wrong places..  very bad.  But this is _very_ rare. Maybe
1 signature duplication every billion nodes or so, more than acceptably small.

The other issue is when you use a 64 bit signature, but your table is less than
2^64 entries.  So you use some part of the signature.  And now the liklihood of
a collision is real, because _many_ different 64 bit signatures will match in
the rightmost N bits for a 2^N entry table.  You store the full 64 bits in the
entry to be sure you don't get false matches, but at times you have to choose
between entry A and B.  Which to keep?  There are plenty of rules (you can look
at hash.c in Crafty for one approach)...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.