Author: Robert Hyatt
Date: 12:05:43 12/13/01
Go up one level in this thread
On December 13, 2001 at 06:07:37, Wylie Garvin wrote: >Hello, > > Does anyone know what the best-performing rehashing scheme is? I read >somewhere (Heinz) that multiple probes in one table perform better than separate >tables. I've also heard that subtree size is best for choosing a node to >overwrite. What does the board think? > >wylie Two answers. First, one based on theory. If you use the low order N bits of the hash signature as the table index, then one good approach to use, if you want to do multiple probes, is to use a _different_ set of bits from the hash signature as a "stride" or "offset" (stride is the vector machine terminology most might recognize) and add this stride to the original index value N-1 times to produce N hash addresses, each separated by the same specific stride. This is the best theoretical solution as it greatly reduces the chances for "chains" to form in memory. But for the PC, it has a problem. If your hash entry is 16 bytes, eight probes as above will result in 8 cache line fills (reading 32 bytes per fill) which will swamp the PC memory bandwidth and kill performance. I used this approach on the Cray, but with its vector hardware, there was no cost whatsoever for the extra 7 loads. Another problem is that if you add some constant extracted from the original hash signature, to the computed hash index, you can run off the end of the hash table. You can fix this either of two ways. (1) after adding the offset, AND it with the original mask, which will "wrap it around" to the front of the table; (2) add enough "slack" to the end of the table so that the offsets will still keep you "inside". We had to do this on the cray, because we used a "vector load" and there is no facility to "wrap around" as in (1). For the PC, multiple probes may still be a good idea. Unfortunately, most of the papers you will see leave out the _critical_ performance data. They might say "8 probes searches a smaller tree" but what if that smaller tree is 10% smaller but the memory bandwidth required slows your program by 20%? That is a net loss. In any case, for the PC, I would avoid the random offset explained above, and simply probe N consecutive table entries. This at least uses only 1/2 the memory bandwidth since a single cache line fill of 32 bytes will now give you two entries rather than one. Whether 8 is the right number or not should be tested. We used 8 in Cray Blitz after testing 2,4,8,16 and 32. I use 2 in Crafty, by using the Belle approach of two separate hash tables. The right answer probably lies somewhere between 2 and 8... For the PC.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.