Computer Chess Club Archives


Search

Terms

Messages

Subject: Generalized estimate of hashing collision rate

Author: Bruce Cleaver

Date: 07:05:15 12/11/02


After reading about the subject of hash collisions (what are the

chances of a hash collision during a game?) I thought about it and

suddenly realized it was similar to the so-called 'Birthday problem'

(how many people must be gathered together to have a 50% chance that at

least two people share the same birthday?).  The Birthday problem is

interesting because people assume the answer is 183 (i.e. 365/2) to get

a 50% chance, but in fact the answer is only 23!  With 38 people it

climbs to about 90%. 100% chance requires 366 people.  Rather

counter-intuitive.

At any rate, the hash-collision problem is similar in form.


Assumptions are:

1) hash signature is 64 bits.
2) hashing function is well written so that it is uniformly distributed
over the space of 2^64.
3) All hash nodes are kept and not replaced.  I realize this does not

accord with actual practice, but cut me some slack here.  Gotta walk

before running.  Maybe I can ginn up an answer with replacement later.


The generalized form for a hashing collision after N nodes (all kept)

is:

1 - [((2^64 - 1)/2^64)*((2^64-2)/2^64))*(...)*((2^64-N)/2^64)]

'Simply' pick N, loop thru N times and see what the answer is.  IF you

wish to see what the value of N is for 50%, that will require a certain

number (probably on the order of 10^11 nodes).  For a 90% chance, the

number N must be bigger, etc.

Now comes a problem - this probably requires extended-accuracy

arithmetic (128 bits or more) to perform correctly.  Can anybody here

help out?

Bruce



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.