Author: martin fierz
Date: 14:49:12 04/07/04
Go up one level in this thread
On April 07, 2004 at 12:05:46, Robert Hyatt wrote: >On April 07, 2004 at 11:20:18, martin fierz wrote: > >>On April 07, 2004 at 10:54:57, Sune Fischer wrote: >> >>>On April 07, 2004 at 10:37:28, martin fierz wrote: >>> >>>>On April 07, 2004 at 10:21:38, Sune Fischer wrote: >>>> >>>>>>>> >>>>>>>>your number is much too high. a good estimate for the hash collision probability >>>>>>>>is 1/sqrt(hashkey_range); which in your case comes out as 2^-32 or about >>>>>>>>somewhere around one in a billion... >>>>>>> >>>>>>>Not at all. It's way less than that with 64 bits Zobrist. >>>>>> >>>>>>i never took the trouble to measure it, but the math is rather straightforward. >>>>>>so i'd say you have an implementation bug too ;-) >>>>>> >>>>>>cheers >>>>>> martin >>>>> >>>>>Not necessarily. If he has a very large hash table with more than 2^32 entries >>>>>the probability of a collision increases rapidly with 64 bit keys. >>>> >>>>you are right of course that vincent should have given the size of his >>>>hashtable. but: 2^32 = 4 billion, with 12 bytes per entry (?) you are talking >>>>48GB here. i hardly think his hashtable has that kind of size. then again, with >>>>his supercomputer stuff, who knows? >>>>i tacitly assumed that he didn't have that kind of table size. >>> >>>I think he might have been using something silly like that on the computer. >>> >>>>besides that, it doesn't really matter: you are probably interested in a >>>>collision frequency, e.g. 1 collision per X lookups. the birthday problem >>>>doesn't really apply to this: it gives you the probability that *any* two >>>>hashkeys in your table are identical. this probabiltiy has a different behavior >>>>than the actual collision rate... >>> >>>It is quite similar from the point on where the hash table is full. >> >>absolutely not! >>the probability that *any* two 64-bit signatures are the same goes from 0...1 as >>you increase table size and is about 0.5 at 2^32 entries (and goes to 1 very >>rapidly from there). >>the collision probability is given only by the number of independent >>verification bits that you have remaining in your hash signature. >> >>cheers >> martin > >I don't totally agree. You are basing this on a slightly twisted definition. >IE table size is proportional to collision rate is wrong, from the point of view >that if the table size is 2^64, the collision rate will be no higher than if the >table size is 2^40. Why? Because we can't fill a 2^40 table, so we certainly >can't fill a 2^64 table. So while this sounds good in theory, in practice it >isn't quite the whole story... well, it is of course meant for full hashtables and sensible hashtable sizes. for that it is probably more or less correct. and with correct, i just mean the order of magnitude. i'm a physicist :-) cheers martin > >Then there is the issue of how much a collision will hurt. I can tell you right >now that one collision every 7M probes will have _zero_ effect on the search >result. I have data from two different programs to support this (the paper is >coming along and I will probably make it available electronically as a technical >report before it is actually published.) i know - you have posted this many times before :-) it's true of course, but we are discussing frequency of hash collisions here. not whether they are relevant :-) cheers martin > > >> >>> >>>If you have 2^64 different keys and 2^20 of these are already used in the table, >>>the probability for the next one (assuming it is different) is going to be 2^46, >>>ie. a frequency of 2^-46. >>> >>>While you are still filling the table the probabilities are lower of course due >>>to "fewer kids in the class". >>> >>>Only new keys not already stored in the table can collide. >>>So to get the real frequency one should not count the true hash hits, and there >>>will be lots of those too so collision rate is a bit lower. >>> >>>Another reducing factor is present if you don't probe/store qsearch, since >>>collisions on these nodes aren't possible either. >>> >>>-S. >>>>cheers >>>> martin >>>> >>>>
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.