Author: Robert Hyatt
Date: 09:05:46 04/07/04
Go up one level in this thread
On April 07, 2004 at 11:20:18, martin fierz wrote: >On April 07, 2004 at 10:54:57, Sune Fischer wrote: > >>On April 07, 2004 at 10:37:28, martin fierz wrote: >> >>>On April 07, 2004 at 10:21:38, Sune Fischer wrote: >>> >>>>>>> >>>>>>>your number is much too high. a good estimate for the hash collision probability >>>>>>>is 1/sqrt(hashkey_range); which in your case comes out as 2^-32 or about >>>>>>>somewhere around one in a billion... >>>>>> >>>>>>Not at all. It's way less than that with 64 bits Zobrist. >>>>> >>>>>i never took the trouble to measure it, but the math is rather straightforward. >>>>>so i'd say you have an implementation bug too ;-) >>>>> >>>>>cheers >>>>> martin >>>> >>>>Not necessarily. If he has a very large hash table with more than 2^32 entries >>>>the probability of a collision increases rapidly with 64 bit keys. >>> >>>you are right of course that vincent should have given the size of his >>>hashtable. but: 2^32 = 4 billion, with 12 bytes per entry (?) you are talking >>>48GB here. i hardly think his hashtable has that kind of size. then again, with >>>his supercomputer stuff, who knows? >>>i tacitly assumed that he didn't have that kind of table size. >> >>I think he might have been using something silly like that on the computer. >> >>>besides that, it doesn't really matter: you are probably interested in a >>>collision frequency, e.g. 1 collision per X lookups. the birthday problem >>>doesn't really apply to this: it gives you the probability that *any* two >>>hashkeys in your table are identical. this probabiltiy has a different behavior >>>than the actual collision rate... >> >>It is quite similar from the point on where the hash table is full. > >absolutely not! >the probability that *any* two 64-bit signatures are the same goes from 0...1 as >you increase table size and is about 0.5 at 2^32 entries (and goes to 1 very >rapidly from there). >the collision probability is given only by the number of independent >verification bits that you have remaining in your hash signature. > >cheers > martin I don't totally agree. You are basing this on a slightly twisted definition. IE table size is proportional to collision rate is wrong, from the point of view that if the table size is 2^64, the collision rate will be no higher than if the table size is 2^40. Why? Because we can't fill a 2^40 table, so we certainly can't fill a 2^64 table. So while this sounds good in theory, in practice it isn't quite the whole story... Then there is the issue of how much a collision will hurt. I can tell you right now that one collision every 7M probes will have _zero_ effect on the search result. I have data from two different programs to support this (the paper is coming along and I will probably make it available electronically as a technical report before it is actually published.) > >> >>If you have 2^64 different keys and 2^20 of these are already used in the table, >>the probability for the next one (assuming it is different) is going to be 2^46, >>ie. a frequency of 2^-46. >> >>While you are still filling the table the probabilities are lower of course due >>to "fewer kids in the class". >> >>Only new keys not already stored in the table can collide. >>So to get the real frequency one should not count the true hash hits, and there >>will be lots of those too so collision rate is a bit lower. >> >>Another reducing factor is present if you don't probe/store qsearch, since >>collisions on these nodes aren't possible either. >> >>-S. >>>cheers >>> martin >>> >>>
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.