Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Rehashing

Author: Stuart Cracraft

Date: 14:17:45 01/21/98

Go up one level in this thread


I am currently "settled" on rehashing with 8 "slots" including the
original slot. However, I have not noticed any major improvement or
degradation of this compared to no rehashing and just using single slot.

--Stuart

On January 21, 1998 at 16:00:15, Don Dailey wrote:

>Stuart,
>
>I think the best way to resolve this is to try both ways.  This
>issues can sometimes get very very trickly and non-intuitive.
>For instance consider:
>
>  1.) With linear probing, there is a definite upper bound on
>      the number of probes you will have to apply to find an
>      empty location (assuming it exists.)
>
>  2.) With re-hashing any number of probes less than INFINITE
>      is possible (but not INFINITE itself!)
>
>This simple reasoning tells us the two are not equivalent.  Here
>at MIT I have access to lot's of experts on this subject (including
>Ron Rivest and Charles Leiserson) and will try to pick their
>brains on this subject.
>
>- Don
>
>
>
>
>On January 21, 1998 at 09:03:26, Robert Hyatt wrote:
>
>>On January 21, 1998 at 05:10:53, Amir Ban wrote:
>>
>>>On January 20, 1998 at 18:42:01, Stuart Cracraft wrote:
>>>
>>>>On January 20, 1998 at 17:55:34, Amir Ban wrote:
>>>>
>>>>>On January 20, 1998 at 16:00:31, Bruce Moreland wrote:
>>>>>
>>>>>>
>>>>>>On January 20, 1998 at 15:27:01, Robert Hyatt wrote:
>>>>>>
>>>>>>>yes.  it's old news.  And a data structures course would call this
>>>>>>>the "hash collision chain problem".  rehashing eliminated chains, but
>>>>>>>produces more cache thrashing...  the latter is more important on a PC
>>>>>>>that already has poor memory bandwidth...
>>>>>>
>>>>>>Assume hash keys are perfectly random, there should be no discernible
>>>>>>relation between distinct positions, no matter how similar they are.
>>>>>>
>>>>>>So if you hash to index I in the table, there should be no more
>>>>>>liklihood that there is an occupied entry at index I+1 than there is at
>>>>>>index I+K or I+rand(), right?  If there is complete entropy here,
>>>>>>chaining should be uniform, regardless of how you rehash.  So you should
>>>>>>be able to use I+1, since that is easiest, if you really want to rehash.
>>>>>>
>>>>>>The fact that the hash elements are next to each other in memory, and
>>>>>>can be accessed with an index, has nothing to do with anything, does it?
>>>>>> There is no more relation betwen I and I+1 than there is between I and
>>>>>>J.
>>>>>>
>>>>>>The above is my intuition.  Intuition also tells me that the world is
>>>>>>flat, though.
>>>>>>
>>>>>>bruce
>>>>>
>>>>>
>>>>>I agree with that.
>>>>>
>>>>>Intutively I also think that there is no scope for improvement on the
>>>>>linear method. Your argument makes the point. Maybe a stronger way to
>>>>>say it is this: Complex probing methods are equivalent to selecting a
>>>>>different hash function, or to rehashing the hashed value. If the
>>>>>hashing is good to start with, this doesn't achieve anything.
>>>>>
>>>>>A different analogy: When you encrypt a message with some encryption
>>>>>method, some people feel that the message would be REALLY secret if at
>>>>>the end you multiply everything by 7 and add 510. That's not true, of
>>>>>course.
>>>>>
>>>>>Amir
>>>>
>>>>I am not sure I can agree with Amir/Bruce yet.
>>>>
>>>>A single hash function used once that hashes to a given
>>>>address and then use linear use the next 7 slots is going
>>>>to "back up" a lot faster when multiple board positions hash
>>>>to the same address. With random rehashing, they won't back
>>>>up (as much) since the randomizer will disperse them throughout
>>>>the table and not just in group of 7 slots after the main address.
>>>>
>>>>--Stuart
>>>
>>>That is correct, but in any other way you do this you will get the same
>>>amount of competition for each slot with the same random distribution.
>>>If you avoid the overloading of a slot through hash hits to the same
>>>place, you will find that statistically it gets equally overloaded
>>>through hash hits to different places.
>>>
>>>Amir
>>
>>
>>This is partially correct, but only partially.  Here is why.
>>
>>assume we use 8 probes into the hash, and that we simply use N, N+1,
>>N+2,
>>..., N+7 as the probe addresses.  You'd like to think that P(anyentry)
>>is
>>1/X where X=number of entries, but this is not so... and you can easily
>>test
>>it in your program to confirm it.
>>
>>So what happens is that you probe to position P, and use the set of 8
>>entries there as candidates for replacement.  Any position that hashes
>>to
>>P, or to P+1,2,3,4,5,6,7 tend to interfere with each other, because
>>hashing
>>to P+2, means we use the set P+2,3,4,5,6,7,8,9 which has 6 positions in
>>common with the position that hashed to P.
>>
>>If you use random rehashing, based on extracting Y bits from the
>>original
>>hash signature, you will likely get two different strides for the
>>positions
>>that hashed to P and P+2.  So the position that hashes to P, would
>>likely
>>have the set P, P+a, P+2a, etc... while the position that hashes to P+2
>>would have the set P+2, P+2+b, P+2+2b,...
>>
>>From hashing theory, this tends to eliminate "chaining".. because once
>>you
>>store at P or P+1, the probability of hitting that "chain" is 2/n,
>>because
>>that chain is of length 2, while the probability of hitting any other
>>position
>>(assuming table is empty except for P,P+1 is still 1/N.  So once a chain
>>gets
>>established, the probability of hitting it is higher than that of
>>hitting
>>any single empty entry.
>>
>>So simple linear hashing works well, but, in those cases where the hash
>>probes are not really very uniformly distributed, and by simple
>>deduction
>>our hashing algorithm is *not* going to uniformly distribute all hash
>>keys,
>>simply based on random sampling theory, random rehashing actually does
>>help
>>to distribute hash entries, even when the original hash indices are not
>>well distributed.
>>
>>To test, clear your hash, set a counter to the number of positions you
>>have,
>>and on each probe, if you store over an empty entry increment a counter
>>by
>>one, and if you probe at a used position, increment a different counter.
>>Compare the two to the thing telling you how many entries you have, and
>>decide whether those keys are as uniform as you originally thought.
>>
>>I ran this a couple of years ago when we had the big hashing discussion
>>in
>>r.g.c.c.  The results are interesting...



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.