Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Rehashing

Author: Robert Hyatt

Date: 18:49:52 01/21/98

Go up one level in this thread


On January 21, 1998 at 17:17:45, Stuart Cracraft wrote:

>I am currently "settled" on rehashing with 8 "slots" including the
>original slot. However, I have not noticed any major improvement or
>degradation of this compared to no rehashing and just using single slot.
>
>--Stuart
>

If you read Beal's paper, which produced results very similar to those
we found in Cray Blitz years ago, the idea of probing multiple entries
doesn't have much effect until the hash table becomes over-subscribed.

Don found significant improvement when searching 10X the nodes that the
hash table would hold.  But for less than 3x or so, there was no real
benefit.  This was in a recent ICCA journal.

You might not be testing long enough for the hash size you are using...



>On January 21, 1998 at 16:00:15, Don Dailey wrote:
>
>>Stuart,
>>
>>I think the best way to resolve this is to try both ways.  This
>>issues can sometimes get very very trickly and non-intuitive.
>>For instance consider:
>>
>>  1.) With linear probing, there is a definite upper bound on
>>      the number of probes you will have to apply to find an
>>      empty location (assuming it exists.)
>>
>>  2.) With re-hashing any number of probes less than INFINITE
>>      is possible (but not INFINITE itself!)
>>
>>This simple reasoning tells us the two are not equivalent.  Here
>>at MIT I have access to lot's of experts on this subject (including
>>Ron Rivest and Charles Leiserson) and will try to pick their
>>brains on this subject.
>>
>>- Don
>>
>>
>>
>>
>>On January 21, 1998 at 09:03:26, Robert Hyatt wrote:
>>
>>>On January 21, 1998 at 05:10:53, Amir Ban wrote:
>>>
>>>>On January 20, 1998 at 18:42:01, Stuart Cracraft wrote:
>>>>
>>>>>On January 20, 1998 at 17:55:34, Amir Ban wrote:
>>>>>
>>>>>>On January 20, 1998 at 16:00:31, Bruce Moreland wrote:
>>>>>>
>>>>>>>
>>>>>>>On January 20, 1998 at 15:27:01, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>yes.  it's old news.  And a data structures course would call this
>>>>>>>>the "hash collision chain problem".  rehashing eliminated chains, but
>>>>>>>>produces more cache thrashing...  the latter is more important on a PC
>>>>>>>>that already has poor memory bandwidth...
>>>>>>>
>>>>>>>Assume hash keys are perfectly random, there should be no discernible
>>>>>>>relation between distinct positions, no matter how similar they are.
>>>>>>>
>>>>>>>So if you hash to index I in the table, there should be no more
>>>>>>>liklihood that there is an occupied entry at index I+1 than there is at
>>>>>>>index I+K or I+rand(), right?  If there is complete entropy here,
>>>>>>>chaining should be uniform, regardless of how you rehash.  So you should
>>>>>>>be able to use I+1, since that is easiest, if you really want to rehash.
>>>>>>>
>>>>>>>The fact that the hash elements are next to each other in memory, and
>>>>>>>can be accessed with an index, has nothing to do with anything, does it?
>>>>>>> There is no more relation betwen I and I+1 than there is between I and
>>>>>>>J.
>>>>>>>
>>>>>>>The above is my intuition.  Intuition also tells me that the world is
>>>>>>>flat, though.
>>>>>>>
>>>>>>>bruce
>>>>>>
>>>>>>
>>>>>>I agree with that.
>>>>>>
>>>>>>Intutively I also think that there is no scope for improvement on the
>>>>>>linear method. Your argument makes the point. Maybe a stronger way to
>>>>>>say it is this: Complex probing methods are equivalent to selecting a
>>>>>>different hash function, or to rehashing the hashed value. If the
>>>>>>hashing is good to start with, this doesn't achieve anything.
>>>>>>
>>>>>>A different analogy: When you encrypt a message with some encryption
>>>>>>method, some people feel that the message would be REALLY secret if at
>>>>>>the end you multiply everything by 7 and add 510. That's not true, of
>>>>>>course.
>>>>>>
>>>>>>Amir
>>>>>
>>>>>I am not sure I can agree with Amir/Bruce yet.
>>>>>
>>>>>A single hash function used once that hashes to a given
>>>>>address and then use linear use the next 7 slots is going
>>>>>to "back up" a lot faster when multiple board positions hash
>>>>>to the same address. With random rehashing, they won't back
>>>>>up (as much) since the randomizer will disperse them throughout
>>>>>the table and not just in group of 7 slots after the main address.
>>>>>
>>>>>--Stuart
>>>>
>>>>That is correct, but in any other way you do this you will get the same
>>>>amount of competition for each slot with the same random distribution.
>>>>If you avoid the overloading of a slot through hash hits to the same
>>>>place, you will find that statistically it gets equally overloaded
>>>>through hash hits to different places.
>>>>
>>>>Amir
>>>
>>>
>>>This is partially correct, but only partially.  Here is why.
>>>
>>>assume we use 8 probes into the hash, and that we simply use N, N+1,
>>>N+2,
>>>..., N+7 as the probe addresses.  You'd like to think that P(anyentry)
>>>is
>>>1/X where X=number of entries, but this is not so... and you can easily
>>>test
>>>it in your program to confirm it.
>>>
>>>So what happens is that you probe to position P, and use the set of 8
>>>entries there as candidates for replacement.  Any position that hashes
>>>to
>>>P, or to P+1,2,3,4,5,6,7 tend to interfere with each other, because
>>>hashing
>>>to P+2, means we use the set P+2,3,4,5,6,7,8,9 which has 6 positions in
>>>common with the position that hashed to P.
>>>
>>>If you use random rehashing, based on extracting Y bits from the
>>>original
>>>hash signature, you will likely get two different strides for the
>>>positions
>>>that hashed to P and P+2.  So the position that hashes to P, would
>>>likely
>>>have the set P, P+a, P+2a, etc... while the position that hashes to P+2
>>>would have the set P+2, P+2+b, P+2+2b,...
>>>
>>>From hashing theory, this tends to eliminate "chaining".. because once
>>>you
>>>store at P or P+1, the probability of hitting that "chain" is 2/n,
>>>because
>>>that chain is of length 2, while the probability of hitting any other
>>>position
>>>(assuming table is empty except for P,P+1 is still 1/N.  So once a chain
>>>gets
>>>established, the probability of hitting it is higher than that of
>>>hitting
>>>any single empty entry.
>>>
>>>So simple linear hashing works well, but, in those cases where the hash
>>>probes are not really very uniformly distributed, and by simple
>>>deduction
>>>our hashing algorithm is *not* going to uniformly distribute all hash
>>>keys,
>>>simply based on random sampling theory, random rehashing actually does
>>>help
>>>to distribute hash entries, even when the original hash indices are not
>>>well distributed.
>>>
>>>To test, clear your hash, set a counter to the number of positions you
>>>have,
>>>and on each probe, if you store over an empty entry increment a counter
>>>by
>>>one, and if you probe at a used position, increment a different counter.
>>>Compare the two to the thing telling you how many entries you have, and
>>>decide whether those keys are as uniform as you originally thought.
>>>
>>>I ran this a couple of years ago when we had the big hashing discussion
>>>in
>>>r.g.c.c.  The results are interesting...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.