Author: Robert Hyatt
Date: 15:08:21 12/13/01
Go up one level in this thread
On December 13, 2001 at 15:37:14, Wylie Garvin wrote: >On December 13, 2001 at 15:05:43, Robert Hyatt wrote: > >>On December 13, 2001 at 06:07:37, Wylie Garvin wrote: >> >>>Hello, >>> >>> Does anyone know what the best-performing rehashing scheme is? I read >>>somewhere (Heinz) that multiple probes in one table perform better than separate >>>tables. I've also heard that subtree size is best for choosing a node to >>>overwrite. What does the board think? >>> >>>wylie >> >> >>Two answers. >> >>First, one based on theory. >> >>If you use the low order N bits of the hash signature as the table index, >>then one good approach to use, if you want to do multiple probes, is to >>use a _different_ set of bits from the hash signature as a "stride" or >>"offset" (stride is the vector machine terminology most might recognize) >>and add this stride to the original index value N-1 times to produce N >>hash addresses, each separated by the same specific stride. >> >>This is the best theoretical solution as it greatly reduces the chances >>for "chains" to form in memory. >> >Right. This sounds like the answer I was looking for. =) Thanks Bob! > >>But for the PC, it has a problem. If your hash entry is 16 bytes, eight >>probes as above will result in 8 cache line fills (reading 32 bytes per >>fill) which will swamp the PC memory bandwidth and kill performance. I >>used this approach on the Cray, but with its vector hardware, there was >>no cost whatsoever for the extra 7 loads. >> >With the P4, I think cache lines are now 64 bytes. I don't know whether this >actually makes the problem worse or not. I haven't looked so I don't know. but 8 memory cycles to fill 64 bytes sure sounds ugly. Particularly for random addressing patterns. > >>Another problem is that if you add some constant extracted from the original >>hash signature, to the computed hash index, you can run off the end of the >>hash table. You can fix this either of two ways. (1) after adding the >>offset, AND it with the original mask, which will "wrap it around" to the >>front of the table; (2) add enough "slack" to the end of the table so that >>the offsets will still keep you "inside". We had to do this on the cray, >>because we used a "vector load" and there is no facility to "wrap around" >>as in (1). >> >>For the PC, multiple probes may still be a good idea. Unfortunately, most of >>the papers you will see leave out the _critical_ performance data. They might >>say "8 probes searches a smaller tree" but what if that smaller tree is 10% >>smaller but the memory bandwidth required slows your program by 20%? That is >>a net loss. >> >>In any case, for the PC, I would avoid the random offset explained above, and >>simply probe N consecutive table entries. This at least uses only 1/2 the >>memory bandwidth since a single cache line fill of 32 bytes will now give you >>two entries rather than one. >> >Yes, your scheme (2) with N consecutive probes is the scheme I came up with. I >didn't have a good justification for it other than cache locality. Maybe that >is all the justification it needs. As well as one cache line holding two >(four?) hash entries, there is another possibility: a hardware prefetching >mechanism can bring in the next cache line while you're still working on this >one. Prefetching was done on the cray. But there is a danger. If you prefetch, you must replace something already there. And if you prefetch something that is not used, and that causes a replacement for something you will need, the cost is high. > >>Whether 8 is the right number or not should be tested. We used 8 in Cray >>Blitz after testing 2,4,8,16 and 32. I use 2 in Crafty, by using the Belle >>approach of two separate hash tables. The right answer probably lies somewhere >>between 2 and 8... For the PC. > >I think Heinz mentions in one of his papers that 4 probes worked best for >DarkThought. Supposedly one table with N probes works better than N tables. >But here's what I wonder...if I'm going to use N consecutive entries in one >table, is that actually any better than N (interleaved) tables? Intuition says >maybe, but not much. Also I suppose without the vector support there's no >reason for N to be a power of two. > >wylie N interleaved tables would not be cache-friendly. On a Cray, it would not make any difference one way or the other. But on the PC, it would most likely hurt in a measurable way.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.