Author: Stuart Cracraft
Date: 14:17:45 01/21/98
Go up one level in this thread
I am currently "settled" on rehashing with 8 "slots" including the original slot. However, I have not noticed any major improvement or degradation of this compared to no rehashing and just using single slot. --Stuart On January 21, 1998 at 16:00:15, Don Dailey wrote: >Stuart, > >I think the best way to resolve this is to try both ways. This >issues can sometimes get very very trickly and non-intuitive. >For instance consider: > > 1.) With linear probing, there is a definite upper bound on > the number of probes you will have to apply to find an > empty location (assuming it exists.) > > 2.) With re-hashing any number of probes less than INFINITE > is possible (but not INFINITE itself!) > >This simple reasoning tells us the two are not equivalent. Here >at MIT I have access to lot's of experts on this subject (including >Ron Rivest and Charles Leiserson) and will try to pick their >brains on this subject. > >- Don > > > > >On January 21, 1998 at 09:03:26, Robert Hyatt wrote: > >>On January 21, 1998 at 05:10:53, Amir Ban wrote: >> >>>On January 20, 1998 at 18:42:01, Stuart Cracraft wrote: >>> >>>>On January 20, 1998 at 17:55:34, Amir Ban wrote: >>>> >>>>>On January 20, 1998 at 16:00:31, Bruce Moreland wrote: >>>>> >>>>>> >>>>>>On January 20, 1998 at 15:27:01, Robert Hyatt wrote: >>>>>> >>>>>>>yes. it's old news. And a data structures course would call this >>>>>>>the "hash collision chain problem". rehashing eliminated chains, but >>>>>>>produces more cache thrashing... the latter is more important on a PC >>>>>>>that already has poor memory bandwidth... >>>>>> >>>>>>Assume hash keys are perfectly random, there should be no discernible >>>>>>relation between distinct positions, no matter how similar they are. >>>>>> >>>>>>So if you hash to index I in the table, there should be no more >>>>>>liklihood that there is an occupied entry at index I+1 than there is at >>>>>>index I+K or I+rand(), right? If there is complete entropy here, >>>>>>chaining should be uniform, regardless of how you rehash. So you should >>>>>>be able to use I+1, since that is easiest, if you really want to rehash. >>>>>> >>>>>>The fact that the hash elements are next to each other in memory, and >>>>>>can be accessed with an index, has nothing to do with anything, does it? >>>>>> There is no more relation betwen I and I+1 than there is between I and >>>>>>J. >>>>>> >>>>>>The above is my intuition. Intuition also tells me that the world is >>>>>>flat, though. >>>>>> >>>>>>bruce >>>>> >>>>> >>>>>I agree with that. >>>>> >>>>>Intutively I also think that there is no scope for improvement on the >>>>>linear method. Your argument makes the point. Maybe a stronger way to >>>>>say it is this: Complex probing methods are equivalent to selecting a >>>>>different hash function, or to rehashing the hashed value. If the >>>>>hashing is good to start with, this doesn't achieve anything. >>>>> >>>>>A different analogy: When you encrypt a message with some encryption >>>>>method, some people feel that the message would be REALLY secret if at >>>>>the end you multiply everything by 7 and add 510. That's not true, of >>>>>course. >>>>> >>>>>Amir >>>> >>>>I am not sure I can agree with Amir/Bruce yet. >>>> >>>>A single hash function used once that hashes to a given >>>>address and then use linear use the next 7 slots is going >>>>to "back up" a lot faster when multiple board positions hash >>>>to the same address. With random rehashing, they won't back >>>>up (as much) since the randomizer will disperse them throughout >>>>the table and not just in group of 7 slots after the main address. >>>> >>>>--Stuart >>> >>>That is correct, but in any other way you do this you will get the same >>>amount of competition for each slot with the same random distribution. >>>If you avoid the overloading of a slot through hash hits to the same >>>place, you will find that statistically it gets equally overloaded >>>through hash hits to different places. >>> >>>Amir >> >> >>This is partially correct, but only partially. Here is why. >> >>assume we use 8 probes into the hash, and that we simply use N, N+1, >>N+2, >>..., N+7 as the probe addresses. You'd like to think that P(anyentry) >>is >>1/X where X=number of entries, but this is not so... and you can easily >>test >>it in your program to confirm it. >> >>So what happens is that you probe to position P, and use the set of 8 >>entries there as candidates for replacement. Any position that hashes >>to >>P, or to P+1,2,3,4,5,6,7 tend to interfere with each other, because >>hashing >>to P+2, means we use the set P+2,3,4,5,6,7,8,9 which has 6 positions in >>common with the position that hashed to P. >> >>If you use random rehashing, based on extracting Y bits from the >>original >>hash signature, you will likely get two different strides for the >>positions >>that hashed to P and P+2. So the position that hashes to P, would >>likely >>have the set P, P+a, P+2a, etc... while the position that hashes to P+2 >>would have the set P+2, P+2+b, P+2+2b,... >> >>From hashing theory, this tends to eliminate "chaining".. because once >>you >>store at P or P+1, the probability of hitting that "chain" is 2/n, >>because >>that chain is of length 2, while the probability of hitting any other >>position >>(assuming table is empty except for P,P+1 is still 1/N. So once a chain >>gets >>established, the probability of hitting it is higher than that of >>hitting >>any single empty entry. >> >>So simple linear hashing works well, but, in those cases where the hash >>probes are not really very uniformly distributed, and by simple >>deduction >>our hashing algorithm is *not* going to uniformly distribute all hash >>keys, >>simply based on random sampling theory, random rehashing actually does >>help >>to distribute hash entries, even when the original hash indices are not >>well distributed. >> >>To test, clear your hash, set a counter to the number of positions you >>have, >>and on each probe, if you store over an empty entry increment a counter >>by >>one, and if you probe at a used position, increment a different counter. >>Compare the two to the thing telling you how many entries you have, and >>decide whether those keys are as uniform as you originally thought. >> >>I ran this a couple of years ago when we had the big hashing discussion >>in >>r.g.c.c. The results are interesting...
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.