Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: best performance from rehashing schemes?

Author: martin fierz
Date: 13:41:49 12/13/01
On December 13, 2001 at 15:05:43, Robert Hyatt wrote:

>On December 13, 2001 at 06:07:37, Wylie Garvin wrote:
>
>>Hello,
>>
>>  Does anyone know what the best-performing rehashing scheme is?  I read
>>somewhere (Heinz) that multiple probes in one table perform better than separate
>>tables.  I've also heard that subtree size is best for choosing a node to
>>overwrite.  What does the board think?
>>
>>wylie
>
>
>Two answers.
>
>First, one based on theory.
>
>If you use the low order N bits of the hash signature as the table index,
>then one good approach to use, if you want to do multiple probes, is to
>use a _different_ set of bits from the hash signature as a "stride" or
>"offset" (stride is the vector machine terminology most might recognize)
>and add this stride to the original index value N-1 times to produce N
>hash addresses, each separated by the same specific stride.
>
>This is the best theoretical solution as it greatly reduces the chances
>for "chains" to form in memory.
>
>But for the PC, it has a problem.  If your hash entry is 16 bytes, eight
>probes as above will result in 8 cache line fills (reading 32 bytes per
>fill) which will swamp the PC memory bandwidth and kill performance.  I
>used this approach on the Cray, but with its vector hardware, there was
>no cost whatsoever for the extra 7 loads.
>
>Another problem is that if you add some constant extracted from the original
>hash signature, to the computed hash index, you can run off the end of the
>hash table.  You can fix this either of two ways.  (1) after adding the
>offset, AND it with the original mask, which will "wrap it around" to the
>front of the table;  (2) add enough "slack" to the end of the table so that
>the offsets will still keep you "inside".  We had to do this on the cray,
>because we used a "vector load" and there is no facility to "wrap around"
>as in (1).
>
>For the PC, multiple probes may still be a good idea.  Unfortunately, most of
>the papers you will see leave out the _critical_ performance data.  They might
>say "8 probes searches a smaller tree" but what if that smaller tree is 10%
>smaller but the memory bandwidth required slows your program by 20%?  That is
>a net loss.
>
>In any case, for the PC, I would avoid the random offset explained above, and
>simply probe N consecutive table entries.  This at least uses only 1/2 the
>memory bandwidth since a single cache line fill of 32 bytes will now give you
>two entries rather than one.
>
>Whether 8 is the right number or not should be tested.  We used 8 in Cray
>Blitz after testing 2,4,8,16 and 32.  I use 2 in Crafty, by using the Belle
>approach of two separate hash tables.  The right answer probably lies somewhere
>between 2 and 8...  For the PC.

i guess the size of the hashtable and your typical search depth would also be
important for the right number. the deeper you search, the more you might want
to protect your most valuable hash entries.
i'm using two tables, one for positions close to the root, and a larger one for
all other positions. in the small one i do an 8-consecutive-positions lookup, in
the larger one i always replace the current entry. this is what worked best for
my checkers program in terms of speed for typical search depths i want to see.

cheers
  martin
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.