Author: Bruce Moreland
Date: 13:11:54 01/20/99
Go up one level in this thread
On January 19, 1999 at 15:43:41, Robert Hyatt wrote:
>On January 19, 1999 at 13:06:33, Bruce Moreland wrote:
>
>>
>>On January 19, 1999 at 08:13:52, Robert Hyatt wrote:
>>
>>>On January 18, 1999 at 19:01:01, Dann Corbit wrote:
>>
>>>(2) use some form of hash 'buckets'. IE in cray blitz,
>>>I took the low order N bits of the hash signature as the probe address. I took
>>>the next 10 bits as a 'rehash offset'. (call this RO). I then checked the
>>>positions at offset, offset+RO, offset+2RO, ... offset+7RO. Either approach is
>>>ok. The latter tends to solve the 'chaining' problem because two different
>>>hash keys identical in the last N bits (probe address) still may be different in
>>>the next 10 bits, giving a different 'bucket'.
>>
>>I have never understood this logic, and would like to discuss it.
>>
>>The situation is that you get a collision at one point in the table, and the
>>question is where to rehash to.
>>
>>I would like to ask why not just set RO to some constant, for instance one.
>>You've got a collision in a table that will tend to be full, I don't understand
>>why the element at some random location in the table is more likely to be empty
>>or unimportant than the one at the next index in the table.
>>
>> for (;;) {
>> if (FOverwrote(index))
>> return;
>> index = (index + 1) % tabsize;
>> }
>>
>>As opposed to:
>>
>> ro = (key >> N) & 2047;
>> for (;;) {
>> if (FOverwrote(index))
>> return;
>> index = (index + ro) % tabsize;
>> }
>>
>>I don't see what you get for your extra instructions.
>>
>>bruce
>
>
>This does two things... the most important of which is to eliminate 'chaining'
>in the probing/storing. IE if you hash to entry N in the table, and you use
>a constant offset, you get a series of entries N, N+R, N+2R, ..., N+7R. If
>another key hashes to N+2R (different low order n bits), now you use the bucket
>N+2R, N+3R, ..., N+9R. Notice that those two have 6 entries in common.
>
>In Cray Blitz, or in the approach commonly called 'rehashing' the first bucket
>is still N, but now R is a variable extracted directly from the hash key. So
>we probe at N, N+R1, N+2R1 ... N+7R1, and for the second try, we go to N+2R1
>(I will stick with the second entry maps to N+2R from above), but we now
>probe N+2R1, N+2R1+R2, N+2R1+2R2, ..., N+2R1+7R7, which only has the first
>entry in common, unless R1=R2 which is unlikely since 0 <= R1 <= 1023 in our
>hashing scheme.
>
>The second benefit (mainly on a Cray) is that the probes above are done as
>vector operations. And now the two vector loads have different 'strides' which
>means we don't line two processors up on the same group of memory banks (the
>cray can use _huge_ levels of interleaving as in 256 banks and up) which means
>this is significantly faster due to fewer bank conflicts between the two
>processors. This turns out to be a really significant thing on a Cray, and was
>the _main_ reason we did it.
>
>Lastly it does a better job of spreading hash entries around in the table as the
>first point shows, which helps make sure all the table gets used.
>
>On a PC this is not so great. The best plan there is to test 8 consecutive
>entries because of how cache is burst-loaded 32 bytes at a time. The old CB
>approach results in 8 X 32byte cache line fills (8 entries, even though each
>entry was only 12 bytes long), while the more 'PC-ish' approach would reference
>8 entries in 96 consecutive bytes, which is only 3 cache line fills. Almost 3x
>faster.
>
>Hope that helps...
I still don't understand why this does anything. If you are standing in the
middle of a crowd and throw a rock in the air, you are going to hit someone in
the head. I don't see why there is any less chance of hitting someone if you
throw it so it lands ten feet from you or a hundred.
I don't see why "next to" is a more interesting relationship in a hash table
than "N away from" is.
If you want to talk to someone, and you don't care who, you can pick a phone
number at random out of the phone book and call. If they don't answer the
phone, I don't know why you'd be more or less likely to get someone to answer
next time if you pick the next entry or if you pick from a random spot in the
book.
bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.