Author: Brian Richardson
Date: 13:35:01 12/13/01
Go up one level in this thread
On December 13, 2001 at 15:37:14, Wylie Garvin wrote: >On December 13, 2001 at 15:05:43, Robert Hyatt wrote: > >>On December 13, 2001 at 06:07:37, Wylie Garvin wrote: >> >>>Hello, >>> >>> Does anyone know what the best-performing rehashing scheme is? I read >>>somewhere (Heinz) that multiple probes in one table perform better than separate >>>tables. I've also heard that subtree size is best for choosing a node to >>>overwrite. What does the board think? >>> >>>wylie >> >> >>Two answers. >> >>First, one based on theory. >> >>If you use the low order N bits of the hash signature as the table index, >>then one good approach to use, if you want to do multiple probes, is to >>use a _different_ set of bits from the hash signature as a "stride" or >>"offset" (stride is the vector machine terminology most might recognize) >>and add this stride to the original index value N-1 times to produce N >>hash addresses, each separated by the same specific stride. >> >>This is the best theoretical solution as it greatly reduces the chances >>for "chains" to form in memory. >> >Right. This sounds like the answer I was looking for. =) Thanks Bob! > >>But for the PC, it has a problem. If your hash entry is 16 bytes, eight >>probes as above will result in 8 cache line fills (reading 32 bytes per >>fill) which will swamp the PC memory bandwidth and kill performance. I >>used this approach on the Cray, but with its vector hardware, there was >>no cost whatsoever for the extra 7 loads. >> >With the P4, I think cache lines are now 64 bytes. I don't know whether this >actually makes the problem worse or not. > >>Another problem is that if you add some constant extracted from the original >>hash signature, to the computed hash index, you can run off the end of the >>hash table. You can fix this either of two ways. (1) after adding the >>offset, AND it with the original mask, which will "wrap it around" to the >>front of the table; (2) add enough "slack" to the end of the table so that >>the offsets will still keep you "inside". We had to do this on the cray, >>because we used a "vector load" and there is no facility to "wrap around" >>as in (1). >> >>For the PC, multiple probes may still be a good idea. Unfortunately, most of >>the papers you will see leave out the _critical_ performance data. They might >>say "8 probes searches a smaller tree" but what if that smaller tree is 10% >>smaller but the memory bandwidth required slows your program by 20%? That is >>a net loss. >> >>In any case, for the PC, I would avoid the random offset explained above, and >>simply probe N consecutive table entries. This at least uses only 1/2 the >>memory bandwidth since a single cache line fill of 32 bytes will now give you >>two entries rather than one. >> >Yes, your scheme (2) with N consecutive probes is the scheme I came up with. I >didn't have a good justification for it other than cache locality. Maybe that >is all the justification it needs. As well as one cache line holding two >(four?) hash entries, there is another possibility: a hardware prefetching >mechanism can bring in the next cache line while you're still working on this >one. > >>Whether 8 is the right number or not should be tested. We used 8 in Cray >>Blitz after testing 2,4,8,16 and 32. I use 2 in Crafty, by using the Belle >>approach of two separate hash tables. The right answer probably lies somewhere >>between 2 and 8... For the PC. > >I think Heinz mentions in one of his papers that 4 probes worked best for >DarkThought. Supposedly one table with N probes works better than N tables. >But here's what I wonder...if I'm going to use N consecutive entries in one >table, is that actually any better than N (interleaved) tables? Intuition says >maybe, but not much. Also I suppose without the vector support there's no >reason for N to be a power of two. > >wylie Most of the data I recall (ICCA articles) showing multi-probe advantages utilized relatively small hash tables (by today's standards). I doubt that with upwards of even a millon (nevermind 16 million or more) entries that multiple probes buy much. I think it really depends on the program, and its unique mix of search techniques, just like does q-search hashing pay or not. Yes for some engines, no for others. Last time I tried various multi-probe schemes with Tinker (depth preferred & always replace, 2 tables, +n probes), all seemed to be net losers. Of course, I might not have managed to implement a good multi-probe (or multi-table) approach...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.