Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: best performance from rehashing schemes?

Author: Robert Hyatt

Date: 12:05:43 12/13/01

On December 13, 2001 at 06:07:37, Wylie Garvin wrote:

>Hello,
>
>  Does anyone know what the best-performing rehashing scheme is?  I read
>somewhere (Heinz) that multiple probes in one table perform better than separate
>tables.  I've also heard that subtree size is best for choosing a node to
>overwrite.  What does the board think?
>
>wylie

Two answers.

First, one based on theory.

If you use the low order N bits of the hash signature as the table index,
then one good approach to use, if you want to do multiple probes, is to
use a _different_ set of bits from the hash signature as a "stride" or
"offset" (stride is the vector machine terminology most might recognize)
and add this stride to the original index value N-1 times to produce N
hash addresses, each separated by the same specific stride.

This is the best theoretical solution as it greatly reduces the chances
for "chains" to form in memory.

But for the PC, it has a problem.  If your hash entry is 16 bytes, eight
probes as above will result in 8 cache line fills (reading 32 bytes per
fill) which will swamp the PC memory bandwidth and kill performance.  I
used this approach on the Cray, but with its vector hardware, there was
no cost whatsoever for the extra 7 loads.

Another problem is that if you add some constant extracted from the original
hash signature, to the computed hash index, you can run off the end of the
hash table.  You can fix this either of two ways.  (1) after adding the
offset, AND it with the original mask, which will "wrap it around" to the
front of the table;  (2) add enough "slack" to the end of the table so that
the offsets will still keep you "inside".  We had to do this on the cray,
because we used a "vector load" and there is no facility to "wrap around"
as in (1).

For the PC, multiple probes may still be a good idea.  Unfortunately, most of
the papers you will see leave out the _critical_ performance data.  They might
say "8 probes searches a smaller tree" but what if that smaller tree is 10%
smaller but the memory bandwidth required slows your program by 20%?  That is
a net loss.

In any case, for the PC, I would avoid the random offset explained above, and
simply probe N consecutive table entries.  This at least uses only 1/2 the
memory bandwidth since a single cache line fill of 32 bytes will now give you
two entries rather than one.

Whether 8 is the right number or not should be tested.  We used 8 in Cray
Blitz after testing 2,4,8,16 and 32.  I use 2 in Crafty, by using the Belle
approach of two separate hash tables.  The right answer probably lies somewhere
between 2 and 8...  For the PC.

Re: best performance from rehashing schemes? martin fierz 13:41:49 12/13/01
Re: best performance from rehashing schemes? Wylie Garvin 12:37:14 12/13/01
- Re: best performance from rehashing schemes? Robert Hyatt 15:08:21 12/13/01
- Re: best performance from rehashing schemes? Brian Richardson 13:35:01 12/13/01
  - Re: best performance from rehashing schemes? Robert Hyatt 15:12:43 12/13/01

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.