Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hash collisions and a little maths

Author: Rémi Coulom

Date: 03:11:22 02/24/99

Go up one level in this thread


On February 18, 1999 at 21:06:53, Vincent Diepeveen wrote:

[...]
>
>Cool that there is someone who can fiddle with math.
>
>Can you change all this to a function that
>gives the estimated number of hash collisions with inputs:
>
>  - #bits
>  - size of hashtable
>  - loading factor of hashtable
>
>Many thanks,
>Vincent
>
[...]

If you assume that hash codes are independant and identically distributed (which
is wrong, but might be a good approximation for this purpose), and that your
hash table is always full (which is the case if you do not clear it after each
move), then the probability to have a collision for a new position (that is not
already in the table) is equal to pc = (size of hash table) / (number of hash
codes). You will have to multiply this probability by the number of rehash you
do if you use one part of the hash code for indexing and this part is not stored
in the hash entry for comparison.

Under these hypotheses, we have a Poisson process and the average time before
the first collision will be 1/pc probes. Note that these are probes with
original positions, and does not count positions that are already in the hash
table.

For instance, TCB crashed on a hash collision last week-end, as it was using
2^18 hash entries, each of them containing 32 bits of hash codes, with a rehash
of 2. I had pc = 2 * 2^18 / 2^(18 + 32) hence 1/pc = 2^31, which makes
collisions very likely to happen in a long match.

I think that testing the legality of the hash move is a good way to reduce pc
significantly. I also think that using the Zobrist technique for computing hash
codes makes collision less likely since hash codes are not independent and
identically distributed : the closer the positions, the less likely a collision.
A good choice of hash keys can help also (see my post about BCH codes).

A good way to measure all of this would be to make experiments with a small
number of bits of hash code and measure the collision rate with different
techniques : with or without test on the hash move, with random or BCH hash
keys, etc. If I find time to do it one day, I will post the results here.

Remi



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.