Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: the usual cache line length discussion

Author: Robert Hyatt

Date: 21:18:24 04/01/03

Go up one level in this thread


On April 01, 2003 at 21:32:35, Vincent Diepeveen wrote:

>On April 01, 2003 at 13:58:49, Robert Hyatt wrote:
>
>>On April 01, 2003 at 13:10:20, Vincent Diepeveen wrote:
>>>>
>>>
>>>Remember that the slow thing for hashtables is to get the bytes out of the RAM,
>>>not from L2 or L1 cache. Those caches are very fast on the P4.
>>>
>>
>>Have I _ever_ said otherwise.  That is why _latency_ is more important than
>>_bandwidth_
>>to a chess engine.
>>
>>
>>
>>>Actually i do assume 64 to 128 bytes with DIEP. 128 i use for hashtable.
>>>
>>>Even then betting on 64 bytes is safe as that's what DDR ram gives at K7.
>>
>>DDR isn't giving 64 bytes.  The cache is _outside_ the memory system and doesn't
>>change
>>depending on what kind of memory you have.  If the K7 has 64 byte line length,
>>then it
>>simply has 64 byte lines regardless of memory..  Cache is on-chip.  Memory is
>>_way_ off-
>>chip.
>>
>>
>>
>>
>>>
>>>Currently you use 32 bytes.\
>>
>>Nope.  Currently a hash entry is a triplet, with 48 bytes.  I've already told
>>you this once.
>>But the hash entries are not on 32 or 64 byte boundaries so it is likely that
>>hash entries
>>span two cache lines as a result.  If you don't do your first probe on a
>>multiple of 128 bytes,
>>you are going to access two lines if you need 128 bytes.
>
>If you code well you can prevent that of course.
>
>Even in C this is possible. As you might know a pointer is an adress in memory.
>So that gives too where the cache line starts. Set the pointer such that you
>always start at a cache line and you can prevent all horrors easily.

I've done this forever, you know.  I force my hash table to 16 byte boundary,
because malloc() only guarantees 8 byte boundary.  But I don't access a multiple
of 16 bytes.  I access 48.  So there is no way to make that lie in one cache
line unless I change the basic idea to 4 or 8 entries, rather than 3.  I kept
3 so the node counts would match the old 2-table approach exactly.


>
>>Cache doesn't load the
>>first byte
>>you access at the beginning of a line, so it is possible that the first byte you
>>access is anywhere
>>in the line unless you force alignment to 128 (or 64) byte boundaries.  I don't
>>do that as I
>>want the 1:2 ratio of depth-preferred to always-store entries.  And at 16 bytes
>>per entry,
>>this puts three entries in 48 consecutive bytes, but only forced to 16 byte
>>alignment.
>>
>>>
>>>within a few years trivially all manufacturers will move up to 128 bytes at
>>>least, simply because that gives a bigger bandwidth and though big bandwidth for
>>>99% of all applications is not the biggest issue (latency more important of
>>>course) the market only knows the word 'bandwidth'. So they demand big
>>>bandwidth.
>>
>>
>>Maybe or maybe not.  Beyond a programmer's control however.  But internally L1
>>and L2
>>are probably not going to match as they did in the P5 days.  PIV is certainly an
>>example of
>>two different line lengths.  And newer processors with three levels of cache may
>>take this
>>to three different line lengths.  I try to not worry about it.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.