Author: Dan Homan
Date: 20:15:18 10/08/97
Go up one level in this thread
On October 08, 1997 at 21:08:43, Robert Hyatt wrote: >On October 08, 1997 at 15:20:38, Dan Homan wrote: > >>On October 08, 1997 at 13:49:14, Chris Whittington wrote: >> >> >>> >>>Ok let's try my maths model. tell me if it predicts the results. >>> >>>Node rate N nodes per sec >>>Hash lookups N/4 hits per sec >>> >>>Chance of hash collision is related to number of bits in hash signature, >>>suppose we have n bits. Collision chance per pair of hash hash lookups = >>>1 / 2^n >>> >>>The hash collision rate is unfortunateyl going to be proportional to >>>(roughly) the SQUARE of the hash lookup rate. >>> >>>Total collisions per second = 1 / 2^n x N/4 x N/4 >>> >>> = N^2 / 2^n / 16 >>> >>>ideally we could go for, say, only one hash collision per week, >>> >>>Someone with a calculator and some spare time .... >>> >>>How big a hash bit count would we need for a 10,000, a 100,000 and a >>>2,000,000 nps program ? >>> >>>Alternatively and additionally, for a 64 bit sized hash index, what >>>collision rate will a 10,000, a 100,000 and a 2,000,000 nps program give >>>? >>> >>>Does this agree with Bob's empirical results ? >>> >>>Chris >> >>Shouldn't this also depend on the size of the table? Including table >>size, my calculation is a little different... >> >>P = probability of a collision on a given table lookup >>T = total number of table entries >>p = probability of two positions having the same hash >> signature ( = 1/2^n , as chris stated, with n = number of bits in >>hash) >>N = nodes searched per second (assume one table lookup at each node) >>R = rate of collisions >> >>So >> >>P = 1 - (1-p)^T (This is just 1 minus the probablity of no collision) >> >>if (1/p) >> T the P = Tp (approximately) >> >>The rate R is then given by.... >> >>R = NP >> >>for a 256K entry table, 32 bit hash code, 30K nodes / sec >> (i.e. T = 256000, n = 32, N = 30000/sec ) >> >> R = 1.78 collisions/sec >> >>for a 48 bit hash code: R ~ 3E-5 collisions/sec >>for a 64 bit hash code: R ~ 4E-10 collisions/sec >> >>T should be the number of hash table entries that are filled, so the >>rate >>of collisions is less than this for a partially filled table. >> >> - Dan >> >>P.S. This was done quick, so somebody please check my math. >> >>--------------------------------------------------------------------- >>Dan Homan dch@vlbi.astro.brandeis.edu >>Physics Department dch@quasar.astro.brandeis.edu >>Brandeis University homan@binah.cc.brandeis.edu >>--------------------------------------------------------------------- >> http://quasar.astro.brandeis.edu/BRAG/people/dch.html >>--------------------------------------------------------------------- > >That is damned good. Because with 32 bit hash keys, 1 per second is not >something that strikes me as being far off from what I was seeing when >Stanback and I were discussing this a while back. Need to get him here >too as he might recall his data better. But this seems to be maybe >right >on the mark for 30K nps and 32 bits... and for 64 bits, because I was >only seeing one every now and then when I tested that... Nice to hear that my "back of the envelope" calculation is in the ballpark. It occured to me this evening that I put in a few un-mentioned assumptions, For example, I assumed the hash table entries are independant of the hash code, which they aren't because the hash code is used (in part) for the addressing. I also assumed that the hash codes in a given search tree are truly random. For getting ballpark numbers, these assumptions are pretty un-important, but the scientist in me just had to mention them :) - Dan p.s. My intuition says that a more detailed calculation will still give a rate pretty close to R = NT/(2^n).
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.