Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Hash collisions and a little maths

Author: Dan Newman
Date: 14:38:53 02/24/99
On February 24, 1999 at 06:11:22, Rémi Coulom wrote:

>On February 18, 1999 at 21:06:53, Vincent Diepeveen wrote:
>
>[...]
>>
>>Cool that there is someone who can fiddle with math.
>>
>>Can you change all this to a function that
>>gives the estimated number of hash collisions with inputs:
>>
>>  - #bits
>>  - size of hashtable
>>  - loading factor of hashtable
>>
>>Many thanks,
>>Vincent
>>
>[...]
>
>If you assume that hash codes are independant and identically distributed (which
>is wrong, but might be a good approximation for this purpose), and that your
>hash table is always full (which is the case if you do not clear it after each
>move), then the probability to have a collision for a new position (that is not
>already in the table) is equal to pc = (size of hash table) / (number of hash
>codes). You will have to multiply this probability by the number of rehash you
>do if you use one part of the hash code for indexing and this part is not stored
>in the hash entry for comparison.
>
>Under these hypotheses, we have a Poisson process and the average time before
>the first collision will be 1/pc probes. Note that these are probes with
>original positions, and does not count positions that are already in the hash
>table.
>
>For instance, TCB crashed on a hash collision last week-end, as it was using
>2^18 hash entries, each of them containing 32 bits of hash codes, with a rehash
>of 2. I had pc = 2 * 2^18 / 2^(18 + 32) hence 1/pc = 2^31, which makes
>collisions very likely to happen in a long match.
>
>I think that testing the legality of the hash move is a good way to reduce pc
>significantly. I also think that using the Zobrist technique for computing hash
>codes makes collision less likely since hash codes are not independent and
>identically distributed : the closer the positions, the less likely a collision.
>A good choice of hash keys can help also (see my post about BCH codes).
>
>A good way to measure all of this would be to make experiments with a small
>number of bits of hash code and measure the collision rate with different
>techniques : with or without test on the hash move, with random or BCH hash
>keys, etc. If I find time to do it one day, I will post the results here.
>
>Remi

Here's what I do to measure hashing error rates.

With 64-bits it's very difficult to measure any sort of error rate as you
might run a day or so before you get one (given a good set of numbers in
your Zobrist table).  So, what I do is count errors that would result
from using a (much) smaller hashcode.  32 bits will give you quite a high
error rate.  I typically get about 1 per second with 32 bits.

Now, on to detection.  What I do is measure what the probing error rate
would have been if I had only used 32 bits (or whatever).  I still use the
full 64 bits to do the lookup, but I pretend that I don't to measure the
(potential) error rate.  I have a separate ifdef'd section of code in
which I pretend to only use 12 bits for a key (20 having been used for the
index).  The 32 remaining bits of the key are used to detect the "error".
The following is a somewhat altered snip of code from my current program
(I use 2 32-bit numbers, hash1 and hash2, for the 64-bit hashcode):


int probe_ttable( int side, int depth, int *value, int *move, int ply)
{
    ttable_entry_t *ttep = ttable[side] + (curply->hash1 & index_mask);

    *move = 0;

#ifdef MEASURE_HASHING_ERROR
    //
    // This measures the lower bound of the number of errors that
    // would have occurred if only using a 32-bit hashcode.
    //
    if( ttep->hash2 != curply->hash2
      && ((ttep->hash1 ^ curply->hash1) & LOCK_MASK) == 0 )
        hash_errors++;
#endif

    if( ttep->hash2 == curply->hash2
      && ((ttep->hash1 ^ curply->hash1) & LOCK_MASK) == 0 ) {
        //
        // compare depth to draft, retrieve stuff, etc.
        //
    }
    //etc.
}


Above, ttep->hash1 stores more than hashcode bits and so has only part of
the 32-bit hash1 portion of the hashcode, the other part having been used
as index bits, and so must be masked off with LOCK_MASK.

This same technique can be used to detect errors of much larger hashcodes.
You could for instance set aside a mere 3 bits to detect errors and do a
fair job of detecting errors in 61-bit hashcodes.  You would of course miss
approximately 1 in 8 of them...  But, if you measure enough of them you can
correct the error rate because you know the fraction you must have missed.

This trick allows you to measure hashing errors without much cost, either
to hash table size or node rate (unlike the technique of stuffing an exact
representation of the position into the table).

-Dan.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.