Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: best performance from rehashing schemes?

Author: Brian Richardson
Date: 13:35:01 12/13/01
On December 13, 2001 at 15:37:14, Wylie Garvin wrote:

>On December 13, 2001 at 15:05:43, Robert Hyatt wrote:
>
>>On December 13, 2001 at 06:07:37, Wylie Garvin wrote:
>>
>>>Hello,
>>>
>>>  Does anyone know what the best-performing rehashing scheme is?  I read
>>>somewhere (Heinz) that multiple probes in one table perform better than separate
>>>tables.  I've also heard that subtree size is best for choosing a node to
>>>overwrite.  What does the board think?
>>>
>>>wylie
>>
>>
>>Two answers.
>>
>>First, one based on theory.
>>
>>If you use the low order N bits of the hash signature as the table index,
>>then one good approach to use, if you want to do multiple probes, is to
>>use a _different_ set of bits from the hash signature as a "stride" or
>>"offset" (stride is the vector machine terminology most might recognize)
>>and add this stride to the original index value N-1 times to produce N
>>hash addresses, each separated by the same specific stride.
>>
>>This is the best theoretical solution as it greatly reduces the chances
>>for "chains" to form in memory.
>>
>Right.  This sounds like the answer I was looking for.  =)  Thanks Bob!
>
>>But for the PC, it has a problem.  If your hash entry is 16 bytes, eight
>>probes as above will result in 8 cache line fills (reading 32 bytes per
>>fill) which will swamp the PC memory bandwidth and kill performance.  I
>>used this approach on the Cray, but with its vector hardware, there was
>>no cost whatsoever for the extra 7 loads.
>>
>With the P4, I think cache lines are now 64 bytes.  I don't know whether this
>actually makes the problem worse or not.
>
>>Another problem is that if you add some constant extracted from the original
>>hash signature, to the computed hash index, you can run off the end of the
>>hash table.  You can fix this either of two ways.  (1) after adding the
>>offset, AND it with the original mask, which will "wrap it around" to the
>>front of the table;  (2) add enough "slack" to the end of the table so that
>>the offsets will still keep you "inside".  We had to do this on the cray,
>>because we used a "vector load" and there is no facility to "wrap around"
>>as in (1).
>>
>>For the PC, multiple probes may still be a good idea.  Unfortunately, most of
>>the papers you will see leave out the _critical_ performance data.  They might
>>say "8 probes searches a smaller tree" but what if that smaller tree is 10%
>>smaller but the memory bandwidth required slows your program by 20%?  That is
>>a net loss.
>>
>>In any case, for the PC, I would avoid the random offset explained above, and
>>simply probe N consecutive table entries.  This at least uses only 1/2 the
>>memory bandwidth since a single cache line fill of 32 bytes will now give you
>>two entries rather than one.
>>
>Yes, your scheme (2) with N consecutive probes is the scheme I came up with.  I
>didn't have a good justification for it other than cache locality.  Maybe that
>is all the justification it needs.  As well as one cache line holding two
>(four?) hash entries, there is another possibility: a hardware prefetching
>mechanism can bring in the next cache line while you're still working on this
>one.
>
>>Whether 8 is the right number or not should be tested.  We used 8 in Cray
>>Blitz after testing 2,4,8,16 and 32.  I use 2 in Crafty, by using the Belle
>>approach of two separate hash tables.  The right answer probably lies somewhere
>>between 2 and 8...  For the PC.
>
>I think Heinz mentions in one of his papers that 4 probes worked best for
>DarkThought.  Supposedly one table with N probes works better than N tables.
>But here's what I wonder...if I'm going to use N consecutive entries in one
>table, is that actually any better than N (interleaved) tables?  Intuition says
>maybe, but not much.  Also I suppose without the vector support there's no
>reason for N to be a power of two.
>
>wylie

Most of the data I recall (ICCA articles) showing multi-probe advantages
utilized relatively small hash tables (by today's standards).  I doubt that with
upwards of even a millon (nevermind 16 million or more) entries that multiple
probes buy much.  I think it really depends on the program, and its unique mix
of search techniques, just like does q-search hashing pay or not.  Yes for some
engines, no for others.  Last time I tried various multi-probe schemes with
Tinker (depth preferred & always replace, 2 tables, +n probes),  all seemed to
be net losers.  Of course, I might not have managed to implement a good
multi-probe (or multi-table) approach...
Re: best performance from rehashing schemes? Robert Hyatt 15:12:43 12/13/01
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.