Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hashkey collisions (typical numbers)

Author: Robert Hyatt

Date: 08:01:58 04/07/04

Go up one level in this thread


On April 07, 2004 at 10:54:57, Sune Fischer wrote:

>On April 07, 2004 at 10:37:28, martin fierz wrote:
>
>>On April 07, 2004 at 10:21:38, Sune Fischer wrote:
>>
>>>>>>
>>>>>>your number is much too high. a good estimate for the hash collision probability
>>>>>>is 1/sqrt(hashkey_range); which in your case comes out as 2^-32 or about
>>>>>>somewhere around one in a billion...
>>>>>
>>>>>Not at all. It's way less than that with 64 bits Zobrist.
>>>>
>>>>i never took the trouble to measure it, but the math is rather straightforward.
>>>>so i'd say you have an implementation bug too ;-)
>>>>
>>>>cheers
>>>>  martin
>>>
>>>Not necessarily. If he has a very large hash table with more than 2^32 entries
>>>the probability of a collision increases rapidly with 64 bit keys.
>>
>>you are right of course that vincent should have given the size of his
>>hashtable. but: 2^32 = 4 billion, with 12 bytes per entry (?) you are talking
>>48GB here. i hardly think his hashtable has that kind of size. then again, with
>>his supercomputer stuff, who knows?
>>i tacitly assumed that he didn't have that kind of table size.
>
>I think he might have been using something silly like that on the computer.
>
>>besides that, it doesn't really matter: you are probably interested in a
>>collision frequency, e.g. 1 collision per X lookups. the birthday problem
>>doesn't really apply to this: it gives you the probability that *any* two
>>hashkeys in your table are identical. this probabiltiy has a different behavior
>>than the actual collision rate...
>
>It is quite similar from the point on where the hash table is full.
>
>If you have 2^64 different keys and 2^20 of these are already used in the table,
>the probability for the next one (assuming it is different) is going to be 2^46,
>ie. a frequency of 2^-46.
>
>While you are still filling the table the probabilities are lower of course due
>to "fewer kids in the class".
>
>Only new keys not already stored in the table can collide.
>So to get the real frequency one should not count the true hash hits, and there
>will be lots of those too so collision rate is a bit lower.
>
>Another reducing factor is present if you don't probe/store qsearch, since
>collisions on these nodes aren't possible either.
>

Another issus is how the signatures are created.   They are not completely
random, they are related, in that sig(plyx) is based on sig(plyx-1) with some
bits changed.  That is a different assumption than in the birthday paradox...




>-S.
>>cheers
>>  martin
>>
>>



This page took 0.19 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.