Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hash tables and data-cache, some programmer stuff...

Author: Robert Hyatt

Date: 15:22:09 01/17/98

Go up one level in this thread


On January 17, 1998 at 13:23:51, Ed Schröder wrote:

>>Posted by Robert Hyatt on January 17, 1998 at 11:25:19:
>
>>In Reply to: Hash tables and data-cache, some programmer stuff... posted by
>>Ed Schröder on January 17, 1998 at 05:22:36:
>
>>On January 17, 1998 at 05:22:36, Ed Schröder wrote:
>
>>>>Posted by Bruce Moreland on January 17, 1998 at 03:25:31:
>
>>>>In Reply to: Re: Computer(CPU)benchmark for chessprograms posted by Ed
>>>>Schröder on January 17, 1998 at 02:21:54:
>
>>>>On January 17, 1998 at 02:21:54, Ed Schröder wrote:
>
>>>>>Every programmer has his own view how to program the hash table. Typical
>>>>>for Rebel is that it stores and uses the q-search results in the hash
>>>>>table (most programs don't do that, true?). Another typical Rebel hash
>>>>>table behavior is that it doesn't store positions in the hash table that
>>>>>have no "best move". The latter speed up Rebel a little.
>
>>this I don't quite follow.  *every* node that "fails low" fits into this
>>category, because if you search every node and each move fails low, you
>>have no idea which move was best.  If you don't store this, then when you
>>reach this same position again, at the same depth, you get to search it
>>again. I will look this up, find that it failed low before, and simply
>>fail low here with no searching.
>
>>Are we talking the same kind of "no best move"???
>
>I think so.
>
>
>>>>I don't do this q-search thing because our searches are probably a lot
>>>>different.  I wouldn't have much use for it the way I do things.
>
>>>>But I haven't spent much time trying to figure out how to make the most
>>>>efficient use of hash memory.  I use the Thompson dual-table scheme
>>>>described numerous places, which has always worked fine for me.
>
>>>>I like the idea of not storing elements that don't have a "best" move
>>>>attached.  You lose the ability to fail low on a node based upon the
>>>>value in the hash table, but a little bit of statistical collection
>>>>seems to indicate that this is pretty rare.
>
>>I just ran a couple of tests.  You should run this on fine #70,  which
>>is
>>the most friendly of all positions to the hash table.  Eliminating the
>>fail low return slows it terribly.  IE I got to depth=27 in 3 seconds
>>normally, but removing the fail low let me reach depth 24 in 12 seconds.
>
>
>>>Ok, some programmer stuff...
>
>>>The reason for the "little" speedup in Rebel (leaving entries with no
>>>best-move out the hash table) is that it saves entries in the Pc
>>>"data-cache". Of course this may not work in every program but I also
>>>can imagine you can speedup a chess program with 5-15% for free. And
>>>there is no risk.
>
>>>A few years ago I re-wrote a very time consuming part in Rebel's
>>>evaluation function. Instead of massive code I pre-calculated all
>>>possibilities and stored the results in 4 Mb memory. Then a simple
>>>table loop replaced all the massive code.
>
>>>I expected a huge speedup of 30-35% but it didn't happen.
>>>In fact it was a little bit slower.
>
>>>The reason is the limited Pc data cache. I assume lots of useful
>>>data (normally kept) now simply disappeared in the Pc data-cache
>>>due to all the extensive 4 Mb table lookup's. I have no other
>>>explanation.
>
>>>So besides all the "pro's" and "contra's" about different hash table
>>>approaches I think one should also take into account the machine's
>>>data-cache behavior too for a final judgement.
>
>>>If I test Rebel with 512 kb hash and the same position with 28 Mb hash
>>>then the "average NPS" with 512 Kb is obvious higher. It's not because
>>>less code has to be examined by the program but because of important
>>>data loss in the Pc data-cache.
>
>>there is another important explanation for going faster.  As you add
>>hash entries, you improve the ability for information to flow across
>>different parts of the tree.  Specifically, as the hash table becomes
>>bigger, it produces more hits, and a hash hit will lower your NPS
>>because
>>you update the board, recursively call search, and invest all that work
>>only to return with a score or bound immediately.  If you didn't get a
>>hit, you would search after investing in the preliminary setup work, and
>>see your NPS go up a little.
>
>It depends of course how you count NPS.
>I remember yours is different then mine.
>
>
>>I've noticed this behavior for years, even in Cray Blitz.  And the Cray
>>XMP/YMP/C90 don't have *any* sort of data cache at all.  So it couldn't
>>be a cache issue on those machines.
>
>Same answer.
>
>
>>>End of programmer stuff...
>
>>>Of course using 28 Mb hash will give much faster results then using
>>>a hash table of 512 Kb but that's another story.
>
>>>- Ed -
>
>
>>even more important at times is it gives much better *results*.  IE if
>>you try fine #70, it is *impossible* to solve with a 19 ply search,
>>because
>>there is no way to win a pawn within 19 plies.  But the hash table lets
>>info flow from searches where the opponent plays badly, to searches
>>where
>>the opponent plays well, and you get the correct result plies quicker.
>
>>don't have the book, but I believe this position is either 23 or 26
>>plies
>>deep, based on the analysis Monty Newborn did in his king and pawn -
>>only
>>endgame program written many years ago.
>
>>So you really can't compare NPS with big/little hash tables, you can't
>>compare time-to-depth either.  Because most of the time you get deeper
>>quicker, but at times you get deeper slower, because you are getting a
>>better result at the same depth you got without the hash table.
>
>You seem to have missed the point I was trying to make regarding
>hash tables. If you use a big hash table you will access (write or
>read) a wide area of memory. All of this is written to the data-cache
>with the risk of overwriting "more important" information (such as
>variables often used). Using a small hash table this risk is less.
>Result a higher NPS.
>


I don't see how this would happen.  IE on a pentium pro, you have 256KB
of
cache.  My hash entry is 16 bytes wide.  This means a 16K entry hash
table
will just exactly fit in cache.  If you use 32K or 64K you are already
thrashing it badly...  I've never run with a hash that small.  I believe
my default is 3/4mb, which is 3x the smaller P6 cache...



>The second point I was trying to make is the use of multiple hash
>tries. If you do just one probe you save the data-cache. If you do
>more (I do 3) this will lower the NPS because you (without realizing
>it) pollute the data-cache with information that is not very useful.
>Result a lower NPS because now in many cases information must come
>from normal memory and you know how slow that is :)

yes.  ALthough if you probe consecutive addresses the speed penalty is
zilch.  Remember that the Pentiums all fetch 32 bytes from memory on
each
cache miss, because they all use 32 byte cache lines.  So you access any
byte in memory and you stream in 32.  And there's nothing that can be
done.  So doing 2 consecutive probes on 16 byte entries would not slow
you down any more than doing only 1.  You thrash cache to exactly the
same extent because of the line-fill policy... 3 is worse than 2, but
4 is no worse than 3.  Again assuming you are accessing consecutive
words.  I don't do this *yet* but I have started rewriting my two-level
cache already to take advantage of this property of the Pentium cache...



>
>Code-cache, data-cache it's really great but it makes programming
>life more difficult at least if you want to know all the why's.

not to mention all the variability.  There are lots of other issues.
Cache, even n-way set associative cache uses direct-mapping to figure
out
which line (or set in set associative) a word must be put in.  Which
means
that your friendly (or unfriendly) operating system can load your
program
into memory in a way that makes it either "cache friendly" or "cache
unfriendly" solely based on how it is loaded.  If it is put into
contiguous
real memory, you get the best possible cache mapping.  If you get
scattered
around (because the operating system can use the memory mapping facility
of the memory management hardware to make scattered pages appear to be
contiguous while executing the program) you might find that you have hot
spots in cache, and that some cache lines are never used at all.

We generally see people starting and testing their NPS repeatedly until
they get loaded into memory in a cache-friendly way...


>
>- Ed -
>
>
>>>>For those who don't know what you mean, the idea here is that a hash
>>>>entry is good for two things:
>>
>>>>1) Cutting off based upon the bound or score contained in it.
>>>>2) Remembering the best move in the position so you can search it first
>>>>next iteration.
>>
>>>>This second reason is the most important one in almost every case.  A
>>>>lot of times,  you are not going to *have* a best move to store in your
>>>>hash element, since you fail low when you search this node.  So in these
>>>>cases, you get less benefit from hashing this node, and you might get
>>>>more from leaving what was there before alone.
>>
>>I don't agree here.  The best move is important for ordering.  But that
>>is
>>a secondary effect.  The fact that you failed low in an identical
>>position
>>at the same depth eliminates a lot more work than simply trying the best
>>move first will.  Because when you flip from one iteration to the next,
>>the best move won't help when you are at a node that is supposed to fail
>>low.  But not searching below that point can certainly help.
>>
>>
>>>>Thompson's dual table method would make this optimization a little hard,
>>>>since you write into your table in order to remember repetitions, but
>>>>perhaps it can be salvaged.
>>
>>I don't use the hash table for repetitions.  I have a simple repetition
>>list that I search.  Since Crafty implements the 50-move rule, this
>>counter
>>tells you how far you have to go back in the repetition list.  I tried
>>the
>>hash approach, but it was no faster for me.  And with internal iterative
>>deepening (what I do when I search a non-fail-low position and there is
>>no
>>hash table entry, such as what happens after the root move fails high
>>and
>>has to be re-searched) the "open position" idea to detect draws became
>>more
>>complicated.
>>
>>
>>>>>The story: take 40 chess programs and you will notice 40 different hash
>>>>>table behaviors.
>>
>>>>No kidding.
>>
>>>>bruce



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.