Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: To Dr Hyatt re: hashtable size

Author: Robert Hyatt

Date: 08:12:42 02/05/04

Go up one level in this thread


On February 04, 2004 at 17:34:33, C McClain Morris, Jr. wrote:

>How did you go about determining hash table size for Crafty on that hardware in
>the tournament? Is there a formula you use for determing hash table size for
>various time controls? And, by the way, congratulations.


You have gotten some good answers.  Thought I would fill in with a couple of
points that were missed.

1.  Bigger is better, up to a point.  What is that point?  Hard to say.  On the
quad opteron, I was the only user, so I knew what was (and would be) going on
while I used it to play chess.  Since I knew I would not need to crank up some
large application (ie netscape, etc) while playing chess, I could set hash to
whatever I wanted without any worries.  If I might have to run something else,
then I would make sure that I didn't use so much memory for hashing that a big
application would make enough memory demands to suddenly cause paging in the
middle of a game.

2.  There is little point in going _too_ big.  It won't hurt, in general, but it
won't help either.  And if it causes interference when you run some other
application on rare occasions, it might be good to avoid trying to use all the
memory you can for hash.

3.  Further, going too big _can_ hurt performance.  The main thing is filesystem
cache.  IE EGTBs.  No matter what EGTB cache size you use, the O/S will still
continue to buffer/cache filesystem data as it is used, which will make the egtb
accesses go much faster.  If you grab most of memory for hash, and you don't
need anywhere near that much, you simply deprive the O/S of a valuable resource
(memory) and end up slowing things down without knowing.

4.  There are other issues but I don't think cache thrashing is important.  Most
procssors are sitting at around 512kb of L2 cache.  If you use 128 megs of
memory for hashing, that is a 256:1 memory:cache ratio not counting the other
stuff that needs to be in cache (instructions and non-hash data).  I doubt that
any two consecutive hash probes hit cache in the normal course of events anyway,
because the memory addresses usually vary wildly from one probe to the next.  I
personally don't even consider this to be an issue.

5.  TLB misses is a problem.  But different processors have different TLB sizes,
and the X86 architecture supports 4K pages as well as 2M or 4M (depends on the
processor).  If your O/S supports the big page size, then the TLB is not an
issue for "normal" hash sizes.  IE the opteron I ran on had something like 1096
(no, not 1024, but an oddball number) TLB entries.  If you use 4mb page sizes,
that will handle 4 gigs of RAM with no TLB misses at all once all the entries
are filled in.  That can be ignored.  If the TLB is not big enough, it is going
to be a problem in any case due to the random addresssing patterns for the hash
probes.

With respect to #5, typical memory latency is 125-150ns for the X86 boxes.  If
you probe memory in a tight "cluster" then you will lean on the TLB and all of
your memory references will come in at the normal memory latency timing.  If
your addresses are spread out enough, then you have too many TLB misses, and you
take a two or three memory latency cycles to access a single word of data, since
the virtual address has to be translated to a physical address and this takes
one or two extra memory accesses (depending on whether you use 4m or 4k pages).
However, remember that you do one hash probe per node.  So whether you have a
big TLB and page size and always get TLB hits (and spend 125-150ns per hash
probe) or you have a small TLB or small page size and never get a TLB hit on
hash probes (and spend 250-300ns [4M page size] or 375-450ns [4K page size]) you
still only incur that penalty once per node.  IE at a million nodes per second,
assuming you probe everywhere in the search, you get one million probes per
second, which turns into a probe cost of 1M times X nanoseconds where X varies
as explained above.)  It isn't free.  But the hits save -far- more since you get
to stop searching.

6.  How big?  Good question.  on the quad I was searching 8-9M nodes per second.
 Use 8 for simplicity.  I don't hash in the q-search, so assume that no more
than one of every 8 nodes is hashed (and that might be high) I need 1M entries
per second.  Say 2 minutes for a long move.  120M entries times 16 bytes per
entry, is about 2 gigs of RAM.  I ran with 3 gigs, although I could have gone
higher.  But leaving the extra 5 gigs (8 gigs total RAM) for the code, the O/S
and the filesystem cache seemed reasonable.

The moral:  don't go big _blindly_.  There are other considerations to think
about in choosing your hash size...

That's why I added the adaptive hash command in Crafty, to let it figure that
out optimally depending on the time control and max memory the user wants to
allow in the worst case.

There might be other considerations that I didn't mention, as well, but that is
a good summary.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.