Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Full circle

Author: Robert Hyatt

Date: 08:45:35 08/28/03

Go up one level in this thread


On August 28, 2003 at 01:50:46, Johan de Koning wrote:

>On August 27, 2003 at 12:25:51, Robert Hyatt wrote:
>
>>On August 26, 2003 at 21:12:45, Johan de Koning wrote:
>
>[snip]
>
>It seems we're finally back to where you and me started off.
>
>>>You keep saying that copy/make causes problems with cach to memory traffic.
>>>Here I was just saying it doesn't, if cache is plenty.
>>
>>Here is the problem:
>>
>>When you write to a line of cache, you _guarantee_ that entire line of cache
>>is going to be written back to memory.  There is absolutely no exceptions to
>>that.  So copying from one cache line to another means that "another line" is
>>going to generate memory traffic.
>
>Here is the solution: write-through caches were abondoned a long time ago.

I'm not talking about write-through.  I am talking about write-back.  Once
you modify a line of cache, that line of cache _is_ going to be written back
to memory.  When is hard to predict, but before it is replaced by another cache
line, it _will_ be written back.  So you write one byte to cache on a PIV, you
are going to dump 128 bytes back to memory at some point.  With only 4096 lines
of cache, it won't be long before that happens...  And there is no way to
prevent it.

>
>And for good reason, think of the frequency at wich data is written (eg just
>stack frame). Once CPU speed / RAM speed hits 10 or so, write-through cache will
>cause almost any program to run RAM bound.

Sure, but that wasn't what I was talking about.  Once a line is "dirty" it is
going back to memory when it is time to replace it.  With just 4K lines of
cache, they get recycled very quickly.

>
>>>>  I claimed that for _my_ program,
>>>>copy/make burned the bus up and getting rid of it made me go 25% faster.
>>>
>>>And I suspect this was because of a tiny cache that couldn't even hold the
>>>heavily used stuff.
>>
>>This was on a (originally) pentium pro, with (I believe) 256K of L2 cache.
>
>L2 is not a good place to keep your heavily used data.

There's no other choice.  L1 is not big enough for anything.  IE the pentium
pro had 16K of L1, 8K data, 8K instruction.  Newer pentiums are not much
better although the 8K instruction has been replaced by the new trace cache
that holds more than 8KB.  And the data cache is up to 16K.  However, I have
run personally on xeons with 512K L2, 1024K L2 and 2048K L2 and I didn't see
any significant difference in performance for my program...  Bigger is slightly
better in each case, but it was never "big enough".




>
>>However, I found the _same_ problem on other architectures, such as the Sparc
>>(super-sparc).  However, I believe it would happen on my 1M L2 cache 700
>>mhz xeons as well, because my "kernel data" is quite large and anything that
>>displaces it from cache will hurt.
>
>Anything could happen, but is it worth the debugging and the added complexity
>if you don't even know the hot spots?
>Or reversed: if under slightly different circumstances the gain would have
>seemed to be around 0, would you have kept the prepare_undo and unmake code?
>
>... Johan

For no gain, I wouldn't have changed, of course...

But there was a significant gain at the time.  I don't think the current
PIV with 512K L2 is much different from the original pentium pro with 256K L2.




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.