Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hash tables and data-cache, some programmer stuff...

Author: Ed Schröder

Date: 02:18:33 01/18/98

Go up one level in this thread


>Posted by Robert Hyatt on January 17, 1998 at 18:22:09:

>>You seem to have missed the point I was trying to make regarding
>>hash tables. If you use a big hash table you will access (write or
>>read) a wide area of memory. All of this is written to the data-cache
>>with the risk of overwriting "more important" information (such as
>>variables often used). Using a small hash table this risk is less.
>>Result a higher NPS.


>I don't see how this would happen.  IE on a pentium pro, you have 256KB
>of
>cache.  My hash entry is 16 bytes wide.  This means a 16K entry hash
>table
>will just exactly fit in cache.  If you use 32K or 64K you are already
>thrashing it badly...  I've never run with a hash that small.  I believe
>my default is 3/4mb, which is 3x the smaller P6 cache...

Just test it and you will see.
See below.

Not to speak I was mainly talking about the first level cache and not
about the second one :)


>>The second point I was trying to make is the use of multiple hash
>>tries. If you do just one probe you save the data-cache. If you do
>>more (I do 3) this will lower the NPS because you (without realizing
>>it) pollute the data-cache with information that is not very useful.
>>Result a lower NPS because now in many cases information must come
>>from normal memory and you know how slow that is :)

>yes.  ALthough if you probe consecutive addresses the speed penalty is
>zilch.  Remember that the Pentiums all fetch 32 bytes from memory on
>each
>cache miss, because they all use 32 byte cache lines.  So you access any
>byte in memory and you stream in 32.  And there's nothing that can be
>done.  So doing 2 consecutive probes on 16 byte entries would not slow
>you down any more than doing only 1.  You thrash cache to exactly the
>same extent because of the line-fill policy... 3 is worse than 2, but
>4 is no worse than 3.  Again assuming you are accessing consecutive
>words.  I don't do this *yet* but I have started rewriting my two-level
>cache already to take advantage of this property of the Pentium cache...

Just create a dummy 4 Mb area in Crafty and then in every evaluation
you via the current hash-key just read the first 4 four bytes of the
entry (so just one 32 bit load). Looking at this code it's cheap and
you may expect no NPS drop at all. But the NPS will drop considerable.
Then replace  the "one 32-bit load" into a "one 32-bit store" and the
NPS will drop further. All because the data-cache is filled with pollute
data. Then increase the 4 Mb into 8/16/32 Mb. NPS will drop further.
Then decrease the 4 Mb into 2/1/0. NPS will increase.

It's not such a important discussion topic but it explains to me why
certain speed-up ideas finally are disappointing.

But with today's "data-cache", "code-cache", "level-1 cache", "level-2
cache" you never can say for certain, "this speed-up idea" doesn't
work" as with a 2-4-8 times bigger "cache" the idea might work after
all which touches the soul of the subject in the above header.


>>Code-cache, data-cache it's really great but it makes programming
>>life more difficult at least if you want to know all the why's.

>not to mention all the variability.  There are lots of other issues.
>Cache, even n-way set associative cache uses direct-mapping to figure
>out
>which line (or set in set associative) a word must be put in.  Which
>means
>that your friendly (or unfriendly) operating system can load your
>program
>into memory in a way that makes it either "cache friendly" or "cache
>unfriendly" solely based on how it is loaded.  If it is put into
>contiguous
>real memory, you get the best possible cache mapping.  If you get
>scattered
>around (because the operating system can use the memory mapping facility
>of the memory management hardware to make scattered pages appear to be
>contiguous while executing the program) you might find that you have hot
>spots in cache, and that some cache lines are never used at all.

>We generally see people starting and testing their NPS repeatedly until
>they get loaded into memory in a cache-friendly way...

It has been always my wish that the Intel guys would offer the world a
"software option" where the programmer can define the cache himself.

If you could define the memory address ranges to be "cached" you
could speed up programs and especially chess programs tremendously.
Just put your most used data in a pre-defined part and mark this
area for caching. And the same for the code part. In this way with
some smart programming we would have had the possibility to run
our chess program entirely in super fast level-1 cache. This
would have been fantastic for us speed addicts, no?

Well, chances are over since the introduction of Windows 3.0 which
allows multiple programs to run. It would have been a fantastic
option for a stand-alone OS like MSDOS.

- Ed -



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.