Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Data structures question for hash tables

Author: Eugene Nalimov

Date: 11:16:54 01/19/99

Go up one level in this thread


On January 19, 1999 at 13:06:33, Bruce Moreland wrote:

>
>On January 19, 1999 at 08:13:52, Robert Hyatt wrote:
>
>>On January 18, 1999 at 19:01:01, Dann Corbit wrote:
>
>>(2) use some form of hash 'buckets'.  IE in cray blitz,
>>I took the low order N bits of the hash signature as the probe address.  I took
>>the next 10 bits as a 'rehash offset'.  (call this RO).  I then checked the
>>positions at offset, offset+RO, offset+2RO, ... offset+7RO. Either approach is
>>ok.  The latter tends to solve the 'chaining' problem because two different
>>hash keys identical in the last N bits (probe address) still may be different in
>>the next 10 bits, giving a different 'bucket'.
>
>I have never understood this logic, and would like to discuss it.
>
>The situation is that you get a collision at one point in the table, and the
>question is where to rehash to.
>
>I would like to ask why not just set RO to some constant, for instance one.
>You've got a collision in a table that will tend to be full, I don't understand
>why the element at some random location in the table is more likely to be empty
>or unimportant than the one at the next index in the table.
>
>    for (;;) {
>        if (FOverwrote(index))
>            return;
>        index = (index + 1) % tabsize;
>    }
>
>As opposed to:
>
>    ro = (key >> N) & 2047;
>    for (;;) {
>        if (FOverwrote(index))
>            return;
>        index = (index + ro) % tabsize;
>    }
>
>I don't see what you get for your extra instructions.
>
>bruce

Linear rehash results in large clusters of non-empty elements.
The element that conflicts with *any* element in the cluster
will be added to that cluster, so big cluster will be bigger
and bigger over the time. With non-linear rehash conflicting
elements will be added to other clusters. You can find
discussion, analysis, and experimental results in

  D. Knuth, The Art of Computer Programming, Vol.3,
            2nd edition, pp. 526-531.

(the data is there in the 1st edition, too, but page numbers
are different).

At least that will happen with classic hash table, without limit
on the # of retries. When there is such a limit I don't know what
will happen, but suspect that when that limit is large enough
(e.g. 8), non-linear rehash will be better (in the terms of hash
table collisions), too - almost all assumptions are the same as in
the classic case.

From the other side, non-linear rehash tends to greatly increase #
of L2 cache misses. For reasonable chess program and reasonable L2
cache size the only L2 cache misses during search will happen
during hash table probing. Several cache misses in a row easily
can cost hundreds of CPU ticks.

Eugene



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.