Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hashkey collisions (typical numbers)

Author: Sune Fischer

Date: 15:35:06 04/07/04

Go up one level in this thread


On April 07, 2004 at 18:03:24, martin fierz wrote:

>On April 07, 2004 at 11:59:58, Sune Fischer wrote:
>
>>>>It is quite similar from the point on where the hash table is full.
>>>
>>>absolutely not!
>>>the probability that *any* two 64-bit signatures are the same goes from 0...1 as
>>>you increase table size and is about 0.5 at 2^32 entries (and goes to 1 very
>>>rapidly from there).
>>
>>I suspect you may in the heat of the moment have thought
>>that 2^32 is half of 2^64?
>
>mmmh, if i thought that they would have thrown me out of university after a
>year. so no, i didn't think that, and i don't think that even when tired or
>drunk :-)

Ok, you never know, that would have explained a lot :)

>no, the formula above stems from the fact that if you have a generalized
>birthday problem with N days in a year and M pupils, you need approximately M =
>sqrt(N) pupils to get a probability of 0.5 that two of them have their birthday
>on the same day. which is just applied above.

Yes approximately.
The true formula is a nasty product of terms with big factorials.

>>>the collision probability is given only by the number of independent
>>>verification bits that you have remaining in your hash signature.
>>>
>>>cheers
>>>  martin
>>
>>Yes it's not _the_ birthday problem because we are searching for a different
>>answer, but it is the same setup.
>>
>>If you have 363 kids in the class and add another kid chance his birthday will
>>collide is very close to 1.
>
>it's *not* the same setup. because we are *never* anywhere near what you just
>described. your example translated to a chess program would mean that you have
>close to 2^64 entries in your hashtable, which you don't. instead, you have a
>measly 2^20.

Yeah but the numbers aren't important.

To make the analog more obvious we can pretend we have a hash table size of 2^64
and then ask how many we need to fill to get a collision probability of 50%.
This would be identical to the birthday problem.

>what is also different is that the collision frequency is a totally different
>measure than the birthday problem. i already said that, but it seems i have to
>explain: the birthday problem asks: "how likely is it that *any* two pupils will
>have their birthday on the same day?" while the hash collision frequency asks
>"if i add one more pupil, how likely is he to have birthday on the same day as
>any of the pupils i have up to now?". so the birthday problem is the integral
>over the collision frequency, very, very different... i hope i explained it
>properly this time.

Thanks for the explanation, but I was just talking about the model behind it all
:)

-S.
>cheers
>  martin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.