# Computer Chess Club Archives

## Messages

### Subject: Re: Hash collisions (was Re: Let's analyze move 36)

Author: Dan Homan

Date: 20:15:18 10/08/97

Go up one level in this thread

```
On October 08, 1997 at 21:08:43, Robert Hyatt wrote:

>On October 08, 1997 at 15:20:38, Dan Homan wrote:
>
>>On October 08, 1997 at 13:49:14, Chris Whittington wrote:
>>
>>
>>>
>>>Ok let's try my maths model. tell me if it predicts the results.
>>>
>>>Node rate N nodes per sec
>>>Hash lookups N/4 hits per sec
>>>
>>>Chance of hash collision is related to number of bits in hash signature,
>>>suppose we have n bits. Collision chance per pair of hash hash lookups =
>>>1 / 2^n
>>>
>>>The hash collision rate is unfortunateyl going to be proportional to
>>>(roughly) the SQUARE of the hash lookup rate.
>>>
>>>Total collisions per second = 1 / 2^n  x N/4 x N/4
>>>
>>> = N^2 / 2^n / 16
>>>
>>>ideally we could go for, say, only one hash collision per week,
>>>
>>>Someone with a calculator and some spare time ....
>>>
>>>How big  a hash bit count would we need for a 10,000, a 100,000 and a
>>>2,000,000 nps program ?
>>>
>>>Alternatively and additionally, for a 64 bit sized hash index, what
>>>collision rate will a 10,000, a 100,000 and a 2,000,000 nps program give
>>>?
>>>
>>>Does this agree with Bob's empirical results ?
>>>
>>>Chris
>>
>>Shouldn't this also depend on the size of the table?  Including table
>>size, my calculation is a little different...
>>
>>P = probability of a collision on a given table lookup
>>T = total number of table entries
>>p = probability of two positions having the same hash
>>    signature ( = 1/2^n , as chris stated, with n = number of bits in
>>hash)
>>N = nodes searched per second (assume one table lookup at each node)
>>R = rate of collisions
>>
>>So
>>
>>P = 1 - (1-p)^T  (This is just 1 minus the probablity of no collision)
>>
>>if (1/p) >> T  the P = Tp (approximately)
>>
>>The rate R is then given by....
>>
>>R = NP
>>
>>for a 256K entry table, 32 bit hash code, 30K nodes / sec
>>  (i.e.  T = 256000, n = 32, N = 30000/sec )
>>
>> R = 1.78 collisions/sec
>>
>>for a 48 bit hash code: R ~ 3E-5 collisions/sec
>>for a 64 bit hash code: R ~ 4E-10 collisions/sec
>>
>>T should be the number of hash table entries that are filled, so the
>>rate
>>of collisions is less than this for a partially filled table.
>>
>> - Dan
>>
>>P.S.  This was done quick, so somebody please check my math.
>>
>>---------------------------------------------------------------------
>>Dan Homan                               dch@vlbi.astro.brandeis.edu
>>Physics Department                      dch@quasar.astro.brandeis.edu
>>Brandeis University                     homan@binah.cc.brandeis.edu
>>---------------------------------------------------------------------
>>       http://quasar.astro.brandeis.edu/BRAG/people/dch.html
>>---------------------------------------------------------------------
>
>That is damned good.  Because with 32 bit hash keys, 1 per second is not
>something that strikes me as being far off from what I was seeing when
>Stanback and I were discussing this a while back.  Need to get him here
>too as he might recall his data better.  But this seems to be maybe
>right
>on the mark for 30K nps and 32 bits...  and for 64 bits, because I was
>only seeing one every now and then when I tested that...

Nice to hear that my "back of the envelope" calculation is in the
ballpark.

It occured to me this evening that I put in a few un-mentioned
assumptions,
For example, I assumed the hash table entries are independant of the
hash
code, which they aren't because the hash code is used (in part) for the
addressing.  I also assumed that the hash codes in a given search tree
are
truly random.

For getting ballpark numbers, these assumptions are pretty un-important,
but
the scientist in me just had to mention them :)

- Dan

p.s.  My intuition says that a more detailed calculation will still give
a rate
pretty close to  R = NT/(2^n).

```