Computer Chess Club Archives




Subject: Re: Hash collisions (was Re: Let's analyze move 36)

Author: Dan Homan

Date: 20:15:18 10/08/97

Go up one level in this thread

On October 08, 1997 at 21:08:43, Robert Hyatt wrote:

>On October 08, 1997 at 15:20:38, Dan Homan wrote:
>>On October 08, 1997 at 13:49:14, Chris Whittington wrote:
>>>Ok let's try my maths model. tell me if it predicts the results.
>>>Node rate N nodes per sec
>>>Hash lookups N/4 hits per sec
>>>Chance of hash collision is related to number of bits in hash signature,
>>>suppose we have n bits. Collision chance per pair of hash hash lookups =
>>>1 / 2^n
>>>The hash collision rate is unfortunateyl going to be proportional to
>>>(roughly) the SQUARE of the hash lookup rate.
>>>Total collisions per second = 1 / 2^n  x N/4 x N/4
>>> = N^2 / 2^n / 16
>>>ideally we could go for, say, only one hash collision per week,
>>>Someone with a calculator and some spare time ....
>>>How big  a hash bit count would we need for a 10,000, a 100,000 and a
>>>2,000,000 nps program ?
>>>Alternatively and additionally, for a 64 bit sized hash index, what
>>>collision rate will a 10,000, a 100,000 and a 2,000,000 nps program give
>>>Does this agree with Bob's empirical results ?
>>Shouldn't this also depend on the size of the table?  Including table
>>size, my calculation is a little different...
>>P = probability of a collision on a given table lookup
>>T = total number of table entries
>>p = probability of two positions having the same hash
>>    signature ( = 1/2^n , as chris stated, with n = number of bits in
>>N = nodes searched per second (assume one table lookup at each node)
>>R = rate of collisions
>>P = 1 - (1-p)^T  (This is just 1 minus the probablity of no collision)
>>if (1/p) >> T  the P = Tp (approximately)
>>The rate R is then given by....
>>R = NP
>>for a 256K entry table, 32 bit hash code, 30K nodes / sec
>>  (i.e.  T = 256000, n = 32, N = 30000/sec )
>> R = 1.78 collisions/sec
>>for a 48 bit hash code: R ~ 3E-5 collisions/sec
>>for a 64 bit hash code: R ~ 4E-10 collisions/sec
>>T should be the number of hash table entries that are filled, so the
>>of collisions is less than this for a partially filled table.
>> - Dan
>>P.S.  This was done quick, so somebody please check my math.
>>Dan Homan                     
>>Physics Department            
>>Brandeis University           
>That is damned good.  Because with 32 bit hash keys, 1 per second is not
>something that strikes me as being far off from what I was seeing when
>Stanback and I were discussing this a while back.  Need to get him here
>too as he might recall his data better.  But this seems to be maybe
>on the mark for 30K nps and 32 bits...  and for 64 bits, because I was
>only seeing one every now and then when I tested that...

Nice to hear that my "back of the envelope" calculation is in the

It occured to me this evening that I put in a few un-mentioned
For example, I assumed the hash table entries are independant of the
code, which they aren't because the hash code is used (in part) for the
addressing.  I also assumed that the hash codes in a given search tree
truly random.

For getting ballpark numbers, these assumptions are pretty un-important,
the scientist in me just had to mention them :)

 - Dan

p.s.  My intuition says that a more detailed calculation will still give
a rate
pretty close to  R = NT/(2^n).

This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.