Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Hash collisions and a little maths

Author: Don Dailey
Date: 21:32:00 02/24/99
I posted this several weeks ago, but I'll repeat it.

The simplest way to measure hash table collision errors is to just
add 2 or 3 check bits.  If you get a match on your regular key,
but the 3 check bits differ, you have a collision.  1 out of 8
collision on the average will not get seen (assuming 3 check bits.)
So you can assume that you are getting 1/8 more collisions that you
actually measure.

Probably most of us have a few extra bits laying around in our
hash table implementation.  They can be used to instrument this
and give you some piece of mind.  If you are willing to use
several extra bits, you will get very accurate measurements,
but even with 2 bits over time you will get an excellent
approximation of your failure rate.

- Don



On February 24, 1999 at 06:11:22, Rémi Coulom wrote:

>On February 18, 1999 at 21:06:53, Vincent Diepeveen wrote:
>
>[...]
>>
>>Cool that there is someone who can fiddle with math.
>>
>>Can you change all this to a function that
>>gives the estimated number of hash collisions with inputs:
>>
>>  - #bits
>>  - size of hashtable
>>  - loading factor of hashtable
>>
>>Many thanks,
>>Vincent
>>
>[...]
>
>If you assume that hash codes are independant and identically distributed (which
>is wrong, but might be a good approximation for this purpose), and that your
>hash table is always full (which is the case if you do not clear it after each
>move), then the probability to have a collision for a new position (that is not
>already in the table) is equal to pc = (size of hash table) / (number of hash
>codes). You will have to multiply this probability by the number of rehash you
>do if you use one part of the hash code for indexing and this part is not stored
>in the hash entry for comparison.
>
>Under these hypotheses, we have a Poisson process and the average time before
>the first collision will be 1/pc probes. Note that these are probes with
>original positions, and does not count positions that are already in the hash
>table.
>
>For instance, TCB crashed on a hash collision last week-end, as it was using
>2^18 hash entries, each of them containing 32 bits of hash codes, with a rehash
>of 2. I had pc = 2 * 2^18 / 2^(18 + 32) hence 1/pc = 2^31, which makes
>collisions very likely to happen in a long match.
>
>I think that testing the legality of the hash move is a good way to reduce pc
>significantly. I also think that using the Zobrist technique for computing hash
>codes makes collision less likely since hash codes are not independent and
>identically distributed : the closer the positions, the less likely a collision.
>A good choice of hash keys can help also (see my post about BCH codes).
>
>A good way to measure all of this would be to make experiments with a small
>number of bits of hash code and measure the collision rate with different
>techniques : with or without test on the hash move, with random or BCH hash
>keys, etc. If I find time to do it one day, I will post the results here.
>
>Remi
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.