Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Data structures question for hash tables

Author: Amir Ban

Date: 00:38:03 01/21/99

Go up one level in this thread


On January 20, 1999 at 20:17:04, Eugene Nalimov wrote:

>On January 20, 1999 at 16:17:53, Bruce Moreland wrote:
>
>>
>>On January 19, 1999 at 14:16:54, Eugene Nalimov wrote:
>>
>>>Linear rehash results in large clusters of non-empty elements.
>>>The element that conflicts with *any* element in the cluster
>>>will be added to that cluster, so big cluster will be bigger
>>>and bigger over the time. With non-linear rehash conflicting
>>>elements will be added to other clusters. You can find
>>>discussion, analysis, and experimental results in
>>>
>>>  D. Knuth, The Art of Computer Programming, Vol.3,
>>>            2nd edition, pp. 526-531.
>>>
>>>(the data is there in the 1st edition, too, but page numbers
>>>are different).
>>>
>>>At least that will happen with classic hash table, without limit
>>>on the # of retries. When there is such a limit I don't know what
>>>will happen, but suspect that when that limit is large enough
>>>(e.g. 8), non-linear rehash will be better (in the terms of hash
>>>table collisions), too - almost all assumptions are the same as in
>>>the classic case.
>>
>>I read this after I responded to Bob.  I will assume Knuth knows what he is
>>talking about, although I still don't understand intuitively why the degradation
>>on *average* won't be the same in both cases, since rather than rarely hitting a
>>real bad case, you are more commonly hitting less bad cases.
>>
>>Unfortunately this won't help me either way since I don't rehash.
>>
>>bruce
>
>Ok. Here is simplified example. Let's assume that we have
>large enough hash table, and there are 20 non-empty entries.
>Now assume that we search each of the keys equal number of
>times.
>
>Case 1. There are 10 clusters, each of length 2. Then we
>have to probe 1.5 entries in average during the search (in
>half of the probes we'll find the entry on the first try,
>in half - on the second try).
>
>Case 2. There is 1 cluster that contains all 20 entries.
>Here average number of probes is 10.5.
>
>Eugene
>

This only explains why clustering is bad. It doesn't explain why doing complex
rehashing get less clustering than sequential rehashing (i.e. adding 1). If you
do complex rehashing, you don't care about clusters of adjacent cells, but you
do care about clusters that are adjacent according to your complicated formula,
and there's no obvious reason why you would do better.

Another way of looking at this is that adopting any deterministic rehashing
function is equivalent to doing sequential rehashing after some mapping of the
entries. E.g. if you rehash in increments of 3's, you may as well have
rearranged your entries 3,6,9,... mod tablesize and done sequential rehashing.

Obviously, if your hash function is good enough to start with, applying any
further rehashing on it by some fixed mapping won't improve it, just like
multiplying a random number by 17 and adding 24 doesn't make it more random.

Amir






This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.