Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Rehashing

Author: Don Dailey
Date: 14:13:50 01/20/98
On January 20, 1998 at 16:10:46, Stuart Cracraft wrote:

>On January 20, 1998 at 16:00:31, Bruce Moreland wrote:
>
>>
>>On January 20, 1998 at 15:27:01, Robert Hyatt wrote:
>>
>>>yes.  it's old news.  And a data structures course would call this
>>>the "hash collision chain problem".  rehashing eliminated chains, but
>>>produces more cache thrashing...  the latter is more important on a PC
>>>that already has poor memory bandwidth...
>>
>>Assume hash keys are perfectly random, there should be no discernible
>>relation between distinct positions, no matter how similar they are.
>>
>>So if you hash to index I in the table, there should be no more
>>liklihood that there is an occupied entry at index I+1 than there is at
>>index I+K or I+rand(), right?  If there is complete entropy here,
>>chaining should be uniform, regardless of how you rehash.  So you should
>>be able to use I+1, since that is easiest, if you really want to rehash.
>>
>>The fact that the hash elements are next to each other in memory, and
>>can be accessed with an index, has nothing to do with anything, does it?
>> There is no more relation betwen I and I+1 than there is between I and
>>J.
>>
>>The above is my intuition.  Intuition also tells me that the world is
>>flat, though.
>>
>>bruce
>
>My thought on this is that in the linear case, if you have several
>positions hashing to the same slot, then the chain will fill up and
>there will be more collisions preventing replacements and storage of
>incoming positions. The rand() will perturb the algorithm differently
>each time preventing collisions from filling up these hash slot
>sequences. So that's why I+rand() is more apt to be empty than I+K.
>
>--Stuart

Stuart,

Bruce's intuition is right on this one.  The phenomenon we are talking
about is often refered to as "clumping."   So typically I+rand() is
considered a better algorithm but this is ONLY when an application
does NOT have normal placement characteristics.  If you tend to use
certain addresses more than others this will be a problem (which we
do not.)   Also you might have a problem with a poor hash function but
the commonly used one for chess is good.  But with chess clumping is not
relevant, you are equally likely to get 10 misses with either scheme.
So I+rand() is NOT more likely to be empty than I+K in our domain.
And I+k is much faster to compute!

- Don
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.