Author: Robert Hyatt
Date: 06:03:26 01/21/98
Go up one level in this thread
On January 21, 1998 at 05:10:53, Amir Ban wrote: >On January 20, 1998 at 18:42:01, Stuart Cracraft wrote: > >>On January 20, 1998 at 17:55:34, Amir Ban wrote: >> >>>On January 20, 1998 at 16:00:31, Bruce Moreland wrote: >>> >>>> >>>>On January 20, 1998 at 15:27:01, Robert Hyatt wrote: >>>> >>>>>yes. it's old news. And a data structures course would call this >>>>>the "hash collision chain problem". rehashing eliminated chains, but >>>>>produces more cache thrashing... the latter is more important on a PC >>>>>that already has poor memory bandwidth... >>>> >>>>Assume hash keys are perfectly random, there should be no discernible >>>>relation between distinct positions, no matter how similar they are. >>>> >>>>So if you hash to index I in the table, there should be no more >>>>liklihood that there is an occupied entry at index I+1 than there is at >>>>index I+K or I+rand(), right? If there is complete entropy here, >>>>chaining should be uniform, regardless of how you rehash. So you should >>>>be able to use I+1, since that is easiest, if you really want to rehash. >>>> >>>>The fact that the hash elements are next to each other in memory, and >>>>can be accessed with an index, has nothing to do with anything, does it? >>>> There is no more relation betwen I and I+1 than there is between I and >>>>J. >>>> >>>>The above is my intuition. Intuition also tells me that the world is >>>>flat, though. >>>> >>>>bruce >>> >>> >>>I agree with that. >>> >>>Intutively I also think that there is no scope for improvement on the >>>linear method. Your argument makes the point. Maybe a stronger way to >>>say it is this: Complex probing methods are equivalent to selecting a >>>different hash function, or to rehashing the hashed value. If the >>>hashing is good to start with, this doesn't achieve anything. >>> >>>A different analogy: When you encrypt a message with some encryption >>>method, some people feel that the message would be REALLY secret if at >>>the end you multiply everything by 7 and add 510. That's not true, of >>>course. >>> >>>Amir >> >>I am not sure I can agree with Amir/Bruce yet. >> >>A single hash function used once that hashes to a given >>address and then use linear use the next 7 slots is going >>to "back up" a lot faster when multiple board positions hash >>to the same address. With random rehashing, they won't back >>up (as much) since the randomizer will disperse them throughout >>the table and not just in group of 7 slots after the main address. >> >>--Stuart > >That is correct, but in any other way you do this you will get the same >amount of competition for each slot with the same random distribution. >If you avoid the overloading of a slot through hash hits to the same >place, you will find that statistically it gets equally overloaded >through hash hits to different places. > >Amir This is partially correct, but only partially. Here is why. assume we use 8 probes into the hash, and that we simply use N, N+1, N+2, ..., N+7 as the probe addresses. You'd like to think that P(anyentry) is 1/X where X=number of entries, but this is not so... and you can easily test it in your program to confirm it. So what happens is that you probe to position P, and use the set of 8 entries there as candidates for replacement. Any position that hashes to P, or to P+1,2,3,4,5,6,7 tend to interfere with each other, because hashing to P+2, means we use the set P+2,3,4,5,6,7,8,9 which has 6 positions in common with the position that hashed to P. If you use random rehashing, based on extracting Y bits from the original hash signature, you will likely get two different strides for the positions that hashed to P and P+2. So the position that hashes to P, would likely have the set P, P+a, P+2a, etc... while the position that hashes to P+2 would have the set P+2, P+2+b, P+2+2b,... From hashing theory, this tends to eliminate "chaining".. because once you store at P or P+1, the probability of hitting that "chain" is 2/n, because that chain is of length 2, while the probability of hitting any other position (assuming table is empty except for P,P+1 is still 1/N. So once a chain gets established, the probability of hitting it is higher than that of hitting any single empty entry. So simple linear hashing works well, but, in those cases where the hash probes are not really very uniformly distributed, and by simple deduction our hashing algorithm is *not* going to uniformly distribute all hash keys, simply based on random sampling theory, random rehashing actually does help to distribute hash entries, even when the original hash indices are not well distributed. To test, clear your hash, set a counter to the number of positions you have, and on each probe, if you store over an empty entry increment a counter by one, and if you probe at a used position, increment a different counter. Compare the two to the thing telling you how many entries you have, and decide whether those keys are as uniform as you originally thought. I ran this a couple of years ago when we had the big hashing discussion in r.g.c.c. The results are interesting...
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.