Author: Robert Hyatt
Date: 08:12:42 02/05/04
Go up one level in this thread
On February 04, 2004 at 17:34:33, C McClain Morris, Jr. wrote: >How did you go about determining hash table size for Crafty on that hardware in >the tournament? Is there a formula you use for determing hash table size for >various time controls? And, by the way, congratulations. You have gotten some good answers. Thought I would fill in with a couple of points that were missed. 1. Bigger is better, up to a point. What is that point? Hard to say. On the quad opteron, I was the only user, so I knew what was (and would be) going on while I used it to play chess. Since I knew I would not need to crank up some large application (ie netscape, etc) while playing chess, I could set hash to whatever I wanted without any worries. If I might have to run something else, then I would make sure that I didn't use so much memory for hashing that a big application would make enough memory demands to suddenly cause paging in the middle of a game. 2. There is little point in going _too_ big. It won't hurt, in general, but it won't help either. And if it causes interference when you run some other application on rare occasions, it might be good to avoid trying to use all the memory you can for hash. 3. Further, going too big _can_ hurt performance. The main thing is filesystem cache. IE EGTBs. No matter what EGTB cache size you use, the O/S will still continue to buffer/cache filesystem data as it is used, which will make the egtb accesses go much faster. If you grab most of memory for hash, and you don't need anywhere near that much, you simply deprive the O/S of a valuable resource (memory) and end up slowing things down without knowing. 4. There are other issues but I don't think cache thrashing is important. Most procssors are sitting at around 512kb of L2 cache. If you use 128 megs of memory for hashing, that is a 256:1 memory:cache ratio not counting the other stuff that needs to be in cache (instructions and non-hash data). I doubt that any two consecutive hash probes hit cache in the normal course of events anyway, because the memory addresses usually vary wildly from one probe to the next. I personally don't even consider this to be an issue. 5. TLB misses is a problem. But different processors have different TLB sizes, and the X86 architecture supports 4K pages as well as 2M or 4M (depends on the processor). If your O/S supports the big page size, then the TLB is not an issue for "normal" hash sizes. IE the opteron I ran on had something like 1096 (no, not 1024, but an oddball number) TLB entries. If you use 4mb page sizes, that will handle 4 gigs of RAM with no TLB misses at all once all the entries are filled in. That can be ignored. If the TLB is not big enough, it is going to be a problem in any case due to the random addresssing patterns for the hash probes. With respect to #5, typical memory latency is 125-150ns for the X86 boxes. If you probe memory in a tight "cluster" then you will lean on the TLB and all of your memory references will come in at the normal memory latency timing. If your addresses are spread out enough, then you have too many TLB misses, and you take a two or three memory latency cycles to access a single word of data, since the virtual address has to be translated to a physical address and this takes one or two extra memory accesses (depending on whether you use 4m or 4k pages). However, remember that you do one hash probe per node. So whether you have a big TLB and page size and always get TLB hits (and spend 125-150ns per hash probe) or you have a small TLB or small page size and never get a TLB hit on hash probes (and spend 250-300ns [4M page size] or 375-450ns [4K page size]) you still only incur that penalty once per node. IE at a million nodes per second, assuming you probe everywhere in the search, you get one million probes per second, which turns into a probe cost of 1M times X nanoseconds where X varies as explained above.) It isn't free. But the hits save -far- more since you get to stop searching. 6. How big? Good question. on the quad I was searching 8-9M nodes per second. Use 8 for simplicity. I don't hash in the q-search, so assume that no more than one of every 8 nodes is hashed (and that might be high) I need 1M entries per second. Say 2 minutes for a long move. 120M entries times 16 bytes per entry, is about 2 gigs of RAM. I ran with 3 gigs, although I could have gone higher. But leaving the extra 5 gigs (8 gigs total RAM) for the code, the O/S and the filesystem cache seemed reasonable. The moral: don't go big _blindly_. There are other considerations to think about in choosing your hash size... That's why I added the adaptive hash command in Crafty, to let it figure that out optimally depending on the time control and max memory the user wants to allow in the worst case. There might be other considerations that I didn't mention, as well, but that is a good summary.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.