Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hashkey collisions (typical numbers)

Author: Sune Fischer

Date: 07:54:57 04/07/04

Go up one level in this thread


On April 07, 2004 at 10:37:28, martin fierz wrote:

>On April 07, 2004 at 10:21:38, Sune Fischer wrote:
>
>>>>>
>>>>>your number is much too high. a good estimate for the hash collision probability
>>>>>is 1/sqrt(hashkey_range); which in your case comes out as 2^-32 or about
>>>>>somewhere around one in a billion...
>>>>
>>>>Not at all. It's way less than that with 64 bits Zobrist.
>>>
>>>i never took the trouble to measure it, but the math is rather straightforward.
>>>so i'd say you have an implementation bug too ;-)
>>>
>>>cheers
>>>  martin
>>
>>Not necessarily. If he has a very large hash table with more than 2^32 entries
>>the probability of a collision increases rapidly with 64 bit keys.
>
>you are right of course that vincent should have given the size of his
>hashtable. but: 2^32 = 4 billion, with 12 bytes per entry (?) you are talking
>48GB here. i hardly think his hashtable has that kind of size. then again, with
>his supercomputer stuff, who knows?
>i tacitly assumed that he didn't have that kind of table size.

I think he might have been using something silly like that on the computer.

>besides that, it doesn't really matter: you are probably interested in a
>collision frequency, e.g. 1 collision per X lookups. the birthday problem
>doesn't really apply to this: it gives you the probability that *any* two
>hashkeys in your table are identical. this probabiltiy has a different behavior
>than the actual collision rate...

It is quite similar from the point on where the hash table is full.

If you have 2^64 different keys and 2^20 of these are already used in the table,
the probability for the next one (assuming it is different) is going to be 2^46,
ie. a frequency of 2^-46.

While you are still filling the table the probabilities are lower of course due
to "fewer kids in the class".

Only new keys not already stored in the table can collide.
So to get the real frequency one should not count the true hash hits, and there
will be lots of those too so collision rate is a bit lower.

Another reducing factor is present if you don't probe/store qsearch, since
collisions on these nodes aren't possible either.

-S.
>cheers
>  martin
>
>



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.