Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: best performance from rehashing schemes?

Author: Wylie Garvin

Date: 12:37:14 12/13/01

Go up one level in this thread


On December 13, 2001 at 15:05:43, Robert Hyatt wrote:

>On December 13, 2001 at 06:07:37, Wylie Garvin wrote:
>
>>Hello,
>>
>>  Does anyone know what the best-performing rehashing scheme is?  I read
>>somewhere (Heinz) that multiple probes in one table perform better than separate
>>tables.  I've also heard that subtree size is best for choosing a node to
>>overwrite.  What does the board think?
>>
>>wylie
>
>
>Two answers.
>
>First, one based on theory.
>
>If you use the low order N bits of the hash signature as the table index,
>then one good approach to use, if you want to do multiple probes, is to
>use a _different_ set of bits from the hash signature as a "stride" or
>"offset" (stride is the vector machine terminology most might recognize)
>and add this stride to the original index value N-1 times to produce N
>hash addresses, each separated by the same specific stride.
>
>This is the best theoretical solution as it greatly reduces the chances
>for "chains" to form in memory.
>
Right.  This sounds like the answer I was looking for.  =)  Thanks Bob!

>But for the PC, it has a problem.  If your hash entry is 16 bytes, eight
>probes as above will result in 8 cache line fills (reading 32 bytes per
>fill) which will swamp the PC memory bandwidth and kill performance.  I
>used this approach on the Cray, but with its vector hardware, there was
>no cost whatsoever for the extra 7 loads.
>
With the P4, I think cache lines are now 64 bytes.  I don't know whether this
actually makes the problem worse or not.

>Another problem is that if you add some constant extracted from the original
>hash signature, to the computed hash index, you can run off the end of the
>hash table.  You can fix this either of two ways.  (1) after adding the
>offset, AND it with the original mask, which will "wrap it around" to the
>front of the table;  (2) add enough "slack" to the end of the table so that
>the offsets will still keep you "inside".  We had to do this on the cray,
>because we used a "vector load" and there is no facility to "wrap around"
>as in (1).
>
>For the PC, multiple probes may still be a good idea.  Unfortunately, most of
>the papers you will see leave out the _critical_ performance data.  They might
>say "8 probes searches a smaller tree" but what if that smaller tree is 10%
>smaller but the memory bandwidth required slows your program by 20%?  That is
>a net loss.
>
>In any case, for the PC, I would avoid the random offset explained above, and
>simply probe N consecutive table entries.  This at least uses only 1/2 the
>memory bandwidth since a single cache line fill of 32 bytes will now give you
>two entries rather than one.
>
Yes, your scheme (2) with N consecutive probes is the scheme I came up with.  I
didn't have a good justification for it other than cache locality.  Maybe that
is all the justification it needs.  As well as one cache line holding two
(four?) hash entries, there is another possibility: a hardware prefetching
mechanism can bring in the next cache line while you're still working on this
one.

>Whether 8 is the right number or not should be tested.  We used 8 in Cray
>Blitz after testing 2,4,8,16 and 32.  I use 2 in Crafty, by using the Belle
>approach of two separate hash tables.  The right answer probably lies somewhere
>between 2 and 8...  For the PC.

I think Heinz mentions in one of his papers that 4 probes worked best for
DarkThought.  Supposedly one table with N probes works better than N tables.
But here's what I wonder...if I'm going to use N consecutive entries in one
table, is that actually any better than N (interleaved) tables?  Intuition says
maybe, but not much.  Also I suppose without the vector support there's no
reason for N to be a power of two.

wylie



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.