Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hashkey collisions (typical numbers)

Author: martin fierz

Date: 14:49:12 04/07/04

Go up one level in this thread


On April 07, 2004 at 12:05:46, Robert Hyatt wrote:

>On April 07, 2004 at 11:20:18, martin fierz wrote:
>
>>On April 07, 2004 at 10:54:57, Sune Fischer wrote:
>>
>>>On April 07, 2004 at 10:37:28, martin fierz wrote:
>>>
>>>>On April 07, 2004 at 10:21:38, Sune Fischer wrote:
>>>>
>>>>>>>>
>>>>>>>>your number is much too high. a good estimate for the hash collision probability
>>>>>>>>is 1/sqrt(hashkey_range); which in your case comes out as 2^-32 or about
>>>>>>>>somewhere around one in a billion...
>>>>>>>
>>>>>>>Not at all. It's way less than that with 64 bits Zobrist.
>>>>>>
>>>>>>i never took the trouble to measure it, but the math is rather straightforward.
>>>>>>so i'd say you have an implementation bug too ;-)
>>>>>>
>>>>>>cheers
>>>>>>  martin
>>>>>
>>>>>Not necessarily. If he has a very large hash table with more than 2^32 entries
>>>>>the probability of a collision increases rapidly with 64 bit keys.
>>>>
>>>>you are right of course that vincent should have given the size of his
>>>>hashtable. but: 2^32 = 4 billion, with 12 bytes per entry (?) you are talking
>>>>48GB here. i hardly think his hashtable has that kind of size. then again, with
>>>>his supercomputer stuff, who knows?
>>>>i tacitly assumed that he didn't have that kind of table size.
>>>
>>>I think he might have been using something silly like that on the computer.
>>>
>>>>besides that, it doesn't really matter: you are probably interested in a
>>>>collision frequency, e.g. 1 collision per X lookups. the birthday problem
>>>>doesn't really apply to this: it gives you the probability that *any* two
>>>>hashkeys in your table are identical. this probabiltiy has a different behavior
>>>>than the actual collision rate...
>>>
>>>It is quite similar from the point on where the hash table is full.
>>
>>absolutely not!
>>the probability that *any* two 64-bit signatures are the same goes from 0...1 as
>>you increase table size and is about 0.5 at 2^32 entries (and goes to 1 very
>>rapidly from there).
>>the collision probability is given only by the number of independent
>>verification bits that you have remaining in your hash signature.
>>
>>cheers
>>  martin
>
>I don't totally agree.  You are basing this on a slightly twisted definition.
>IE table size is proportional to collision rate is wrong, from the point of view
>that if the table size is 2^64, the collision rate will be no higher than if the
>table size is 2^40.  Why?  Because we can't fill a 2^40 table, so we certainly
>can't fill a 2^64 table.  So while this sounds good in theory, in practice it
>isn't quite the whole story...

well, it is of course meant for full hashtables and sensible hashtable sizes.
for that it is probably more or less correct. and with correct, i just mean the
order of magnitude. i'm a physicist :-)

cheers
  martin


>
>Then there is the issue of how much a collision will hurt.  I can tell you right
>now that one collision every 7M probes will have _zero_ effect on the search
>result.  I have data from two different programs to support this (the paper is
>coming along and I will probably make it available electronically as a technical
>report before it is actually published.)

i know - you have posted this many times before  :-)
it's true of course, but we are discussing frequency of hash collisions here.
not whether they are relevant :-)

cheers
  martin


>
>
>>
>>>
>>>If you have 2^64 different keys and 2^20 of these are already used in the table,
>>>the probability for the next one (assuming it is different) is going to be 2^46,
>>>ie. a frequency of 2^-46.
>>>
>>>While you are still filling the table the probabilities are lower of course due
>>>to "fewer kids in the class".
>>>
>>>Only new keys not already stored in the table can collide.
>>>So to get the real frequency one should not count the true hash hits, and there
>>>will be lots of those too so collision rate is a bit lower.
>>>
>>>Another reducing factor is present if you don't probe/store qsearch, since
>>>collisions on these nodes aren't possible either.
>>>
>>>-S.
>>>>cheers
>>>>  martin
>>>>
>>>>



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.