Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Rehashing

Author: Don Dailey

Date: 13:00:15 01/21/98

Go up one level in this thread


Stuart,

I think the best way to resolve this is to try both ways.  This
issues can sometimes get very very trickly and non-intuitive.
For instance consider:

  1.) With linear probing, there is a definite upper bound on
      the number of probes you will have to apply to find an
      empty location (assuming it exists.)

  2.) With re-hashing any number of probes less than INFINITE
      is possible (but not INFINITE itself!)

This simple reasoning tells us the two are not equivalent.  Here
at MIT I have access to lot's of experts on this subject (including
Ron Rivest and Charles Leiserson) and will try to pick their
brains on this subject.

- Don




On January 21, 1998 at 09:03:26, Robert Hyatt wrote:

>On January 21, 1998 at 05:10:53, Amir Ban wrote:
>
>>On January 20, 1998 at 18:42:01, Stuart Cracraft wrote:
>>
>>>On January 20, 1998 at 17:55:34, Amir Ban wrote:
>>>
>>>>On January 20, 1998 at 16:00:31, Bruce Moreland wrote:
>>>>
>>>>>
>>>>>On January 20, 1998 at 15:27:01, Robert Hyatt wrote:
>>>>>
>>>>>>yes.  it's old news.  And a data structures course would call this
>>>>>>the "hash collision chain problem".  rehashing eliminated chains, but
>>>>>>produces more cache thrashing...  the latter is more important on a PC
>>>>>>that already has poor memory bandwidth...
>>>>>
>>>>>Assume hash keys are perfectly random, there should be no discernible
>>>>>relation between distinct positions, no matter how similar they are.
>>>>>
>>>>>So if you hash to index I in the table, there should be no more
>>>>>liklihood that there is an occupied entry at index I+1 than there is at
>>>>>index I+K or I+rand(), right?  If there is complete entropy here,
>>>>>chaining should be uniform, regardless of how you rehash.  So you should
>>>>>be able to use I+1, since that is easiest, if you really want to rehash.
>>>>>
>>>>>The fact that the hash elements are next to each other in memory, and
>>>>>can be accessed with an index, has nothing to do with anything, does it?
>>>>> There is no more relation betwen I and I+1 than there is between I and
>>>>>J.
>>>>>
>>>>>The above is my intuition.  Intuition also tells me that the world is
>>>>>flat, though.
>>>>>
>>>>>bruce
>>>>
>>>>
>>>>I agree with that.
>>>>
>>>>Intutively I also think that there is no scope for improvement on the
>>>>linear method. Your argument makes the point. Maybe a stronger way to
>>>>say it is this: Complex probing methods are equivalent to selecting a
>>>>different hash function, or to rehashing the hashed value. If the
>>>>hashing is good to start with, this doesn't achieve anything.
>>>>
>>>>A different analogy: When you encrypt a message with some encryption
>>>>method, some people feel that the message would be REALLY secret if at
>>>>the end you multiply everything by 7 and add 510. That's not true, of
>>>>course.
>>>>
>>>>Amir
>>>
>>>I am not sure I can agree with Amir/Bruce yet.
>>>
>>>A single hash function used once that hashes to a given
>>>address and then use linear use the next 7 slots is going
>>>to "back up" a lot faster when multiple board positions hash
>>>to the same address. With random rehashing, they won't back
>>>up (as much) since the randomizer will disperse them throughout
>>>the table and not just in group of 7 slots after the main address.
>>>
>>>--Stuart
>>
>>That is correct, but in any other way you do this you will get the same
>>amount of competition for each slot with the same random distribution.
>>If you avoid the overloading of a slot through hash hits to the same
>>place, you will find that statistically it gets equally overloaded
>>through hash hits to different places.
>>
>>Amir
>
>
>This is partially correct, but only partially.  Here is why.
>
>assume we use 8 probes into the hash, and that we simply use N, N+1,
>N+2,
>..., N+7 as the probe addresses.  You'd like to think that P(anyentry)
>is
>1/X where X=number of entries, but this is not so... and you can easily
>test
>it in your program to confirm it.
>
>So what happens is that you probe to position P, and use the set of 8
>entries there as candidates for replacement.  Any position that hashes
>to
>P, or to P+1,2,3,4,5,6,7 tend to interfere with each other, because
>hashing
>to P+2, means we use the set P+2,3,4,5,6,7,8,9 which has 6 positions in
>common with the position that hashed to P.
>
>If you use random rehashing, based on extracting Y bits from the
>original
>hash signature, you will likely get two different strides for the
>positions
>that hashed to P and P+2.  So the position that hashes to P, would
>likely
>have the set P, P+a, P+2a, etc... while the position that hashes to P+2
>would have the set P+2, P+2+b, P+2+2b,...
>
>From hashing theory, this tends to eliminate "chaining".. because once
>you
>store at P or P+1, the probability of hitting that "chain" is 2/n,
>because
>that chain is of length 2, while the probability of hitting any other
>position
>(assuming table is empty except for P,P+1 is still 1/N.  So once a chain
>gets
>established, the probability of hitting it is higher than that of
>hitting
>any single empty entry.
>
>So simple linear hashing works well, but, in those cases where the hash
>probes are not really very uniformly distributed, and by simple
>deduction
>our hashing algorithm is *not* going to uniformly distribute all hash
>keys,
>simply based on random sampling theory, random rehashing actually does
>help
>to distribute hash entries, even when the original hash indices are not
>well distributed.
>
>To test, clear your hash, set a counter to the number of positions you
>have,
>and on each probe, if you store over an empty entry increment a counter
>by
>one, and if you probe at a used position, increment a different counter.
>Compare the two to the thing telling you how many entries you have, and
>decide whether those keys are as uniform as you originally thought.
>
>I ran this a couple of years ago when we had the big hashing discussion
>in
>r.g.c.c.  The results are interesting...



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.