Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Rehashing

Author: Robert Hyatt
Date: 05:32:55 01/22/98
On January 22, 1998 at 01:33:50, Don Dailey wrote:

>Hi Bob,
>
>I know there is no one trying to find a free space.  But whatever
>the goal the two are equivalent in some sense.  The question is
>what are the odds of getting an appropriate (whether empty or
>a better location to overwrite) entry quickly.   If you use
>set associativity it's pretty much moot (which I agree is the
>way to go) but this discussion is quite interesting.
>
>I just figured out that there is indeed a higher likelyhood of
>not getting an appropriate entry on the second try with linear
>probing and open addressing.  Wherever a used entry exists in
>the table, there is greater likehood of an immediate neighbor
>in the direction of your probe.
>
>But the interesting feature of this is that your likelihood
>of getting a writable location (or free spot if you're looking
>at it this way) increases with each probe if you are using
>linear addressing and STAYS EXACTLY the same with rehashing!
>I suspect the 2 effects cancel each other to give same average
>behavior but I could be completely wrong about this, it's just
>a guess.
>

maybe I misunderstand what you are saying.  But IMHO, whether you
use linear or random indexing away from the first position probed,
the chances of finding something to overwrite are equal with the
two, as long as you have uniform probing.  I simply say that we don't
have uniform probing, and statistical sampling theory supports this
because we are using random numbers to produce the hash key.

IE if you apply the poker test to your set of random numbers, you
would expect to find 4 consecutive random numbers that are identical,
for example, because the odds of drawing 4 of a kind are not zero.
Ditto for a sequence like 1,2,3,4,5,6,7,8,9,10, because the odds of
such a "run" is not zero.  And as long as we have this randomness
property to deal with, our hashing is not going to produce uniformly
distributed probes.  So anything I can do artificially to avoid
"chaining" in the table that would be seen by linear indexing with
a stride of "1" has to help.  Again, not multi-digit improvements, but
modest single-digit improvement.

*but* on a PC, this changes, because non-stride1 probes bust the cache
badly and the performance loss will likely offset the random probe
gain.  In Cray Blitz we didn't have any data cache to worry about, and
could take advantage of vector processing properties to do multiple
non-stride1 probes with practically no cost at all.  I carried this
idea to the PC but found multiple non-stride1 probes was bad on a PC...



>- Don
>
>
>On January 21, 1998 at 21:47:00, Robert Hyatt wrote:
>
>>On January 21, 1998 at 16:00:15, Don Dailey wrote:
>>
>>>Stuart,
>>>
>>>I think the best way to resolve this is to try both ways.  This
>>>issues can sometimes get very very trickly and non-intuitive.
>>>For instance consider:
>>>
>>>  1.) With linear probing, there is a definite upper bound on
>>>      the number of probes you will have to apply to find an
>>>      empty location (assuming it exists.)
>>>
>>>  2.) With re-hashing any number of probes less than INFINITE
>>>      is possible (but not INFINITE itself!)
>>>
>>>This simple reasoning tells us the two are not equivalent.  Here
>>>at MIT I have access to lot's of experts on this subject (including
>>>Ron Rivest and Charles Leiserson) and will try to pick their
>>>brains on this subject.
>>>
>>>- Don
>>>
>>
>>we aren't interpreting this the same way.  I am not thinking of linear
>>addressing for finding an "empty" position.  I would only look at N
>>positions, either N consecutive, or N spaced out (strided) based on some
>>constant that could be random.
>>
>>I would only consider N-way set associativity however where N is a
>>constant.  The larger N is, the better the hashing algorithm will
>>perform, but the higher the memory bandwidth needed to fetch that large
>>set of entries.  For a PC, N can't be very big or it will choke things
>>and the cost for fetching N will override the benefit of improved hash
>>hits.
>>
>>In the case of the Cray, N can be large because of the vector loads.  We
>>generally used N=16, but also played with 32, 64 and 128 with no
>>performance
>>hit to speak of for loading, although beyond 16 we didn't see much
>>improvement
>>in hashing performance either.
>>
>>But I would *never* consider searching linearly for an overwritable
>>entry.
>>That would totally bust a PC or most any non-cray architecture...
>>At least I wouldn't search more than N entries where N is very small on
>>the
>>PC.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.