Computer Chess Club Archives




Subject: Hash collisions and a little maths

Author: Rémi Coulom

Date: 03:32:12 02/18/99

I just calculated the probability to have two identical values in a list of p
values chosen at random among q possible values. Maybe everybody knew this
already, but I was ignorant of the result and I find it is interesting. The
result is, for large values of q and p < q, equivalent to exp(-(p*p)/(2*q)). It
means that the probability of collision becomes high as soon as p is close to
the square root of q.

A consequence of this is that if you take 1,000 people, you have a very high
probability that two of them have the exact same number of hair on their heads,
assuming that the order of magnitude of this number is 100,000.

A less funny consequence for computer chess programmers is that if you use 2^64
possible values for hash codes, you have a 1% probability of collision for 600 M
values chosen at random. Of course, hash codes in a search tree are not
independant and identically distributed.

More experienced programmer maybe knew this since long, but I did not after 4
years of using a hash table and I thought some of you might be interested.

By the way, I wonder how big were the hash codes in Deep Blue.


This page took 0.04 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.