Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hashkey collisions (typical numbers)

Author: Robert Hyatt

Date: 09:05:46 04/07/04

Go up one level in this thread


On April 07, 2004 at 11:20:18, martin fierz wrote:

>On April 07, 2004 at 10:54:57, Sune Fischer wrote:
>
>>On April 07, 2004 at 10:37:28, martin fierz wrote:
>>
>>>On April 07, 2004 at 10:21:38, Sune Fischer wrote:
>>>
>>>>>>>
>>>>>>>your number is much too high. a good estimate for the hash collision probability
>>>>>>>is 1/sqrt(hashkey_range); which in your case comes out as 2^-32 or about
>>>>>>>somewhere around one in a billion...
>>>>>>
>>>>>>Not at all. It's way less than that with 64 bits Zobrist.
>>>>>
>>>>>i never took the trouble to measure it, but the math is rather straightforward.
>>>>>so i'd say you have an implementation bug too ;-)
>>>>>
>>>>>cheers
>>>>>  martin
>>>>
>>>>Not necessarily. If he has a very large hash table with more than 2^32 entries
>>>>the probability of a collision increases rapidly with 64 bit keys.
>>>
>>>you are right of course that vincent should have given the size of his
>>>hashtable. but: 2^32 = 4 billion, with 12 bytes per entry (?) you are talking
>>>48GB here. i hardly think his hashtable has that kind of size. then again, with
>>>his supercomputer stuff, who knows?
>>>i tacitly assumed that he didn't have that kind of table size.
>>
>>I think he might have been using something silly like that on the computer.
>>
>>>besides that, it doesn't really matter: you are probably interested in a
>>>collision frequency, e.g. 1 collision per X lookups. the birthday problem
>>>doesn't really apply to this: it gives you the probability that *any* two
>>>hashkeys in your table are identical. this probabiltiy has a different behavior
>>>than the actual collision rate...
>>
>>It is quite similar from the point on where the hash table is full.
>
>absolutely not!
>the probability that *any* two 64-bit signatures are the same goes from 0...1 as
>you increase table size and is about 0.5 at 2^32 entries (and goes to 1 very
>rapidly from there).
>the collision probability is given only by the number of independent
>verification bits that you have remaining in your hash signature.
>
>cheers
>  martin

I don't totally agree.  You are basing this on a slightly twisted definition.
IE table size is proportional to collision rate is wrong, from the point of view
that if the table size is 2^64, the collision rate will be no higher than if the
table size is 2^40.  Why?  Because we can't fill a 2^40 table, so we certainly
can't fill a 2^64 table.  So while this sounds good in theory, in practice it
isn't quite the whole story...

Then there is the issue of how much a collision will hurt.  I can tell you right
now that one collision every 7M probes will have _zero_ effect on the search
result.  I have data from two different programs to support this (the paper is
coming along and I will probably make it available electronically as a technical
report before it is actually published.)


>
>>
>>If you have 2^64 different keys and 2^20 of these are already used in the table,
>>the probability for the next one (assuming it is different) is going to be 2^46,
>>ie. a frequency of 2^-46.
>>
>>While you are still filling the table the probabilities are lower of course due
>>to "fewer kids in the class".
>>
>>Only new keys not already stored in the table can collide.
>>So to get the real frequency one should not count the true hash hits, and there
>>will be lots of those too so collision rate is a bit lower.
>>
>>Another reducing factor is present if you don't probe/store qsearch, since
>>collisions on these nodes aren't possible either.
>>
>>-S.
>>>cheers
>>>  martin
>>>
>>>



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.