Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Full circle

Author: Vincent Diepeveen

Date: 09:25:23 08/29/03

Go up one level in this thread


On August 28, 2003 at 11:45:35, Robert Hyatt wrote:

>On August 28, 2003 at 01:50:46, Johan de Koning wrote:
>
>>On August 27, 2003 at 12:25:51, Robert Hyatt wrote:
>>
>>>On August 26, 2003 at 21:12:45, Johan de Koning wrote:
>>
>>[snip]
>>
>>It seems we're finally back to where you and me started off.
>>
>>>>You keep saying that copy/make causes problems with cach to memory traffic.
>>>>Here I was just saying it doesn't, if cache is plenty.
>>>
>>>Here is the problem:
>>>
>>>When you write to a line of cache, you _guarantee_ that entire line of cache
>>>is going to be written back to memory.  There is absolutely no exceptions to
>>>that.  So copying from one cache line to another means that "another line" is
>>>going to generate memory traffic.
>>
>>Here is the solution: write-through caches were abondoned a long time ago.
>
>I'm not talking about write-through.  I am talking about write-back.  Once
>you modify a line of cache, that line of cache _is_ going to be written back
>to memory.  When is hard to predict, but before it is replaced by another cache
>line, it _will_ be written back.  So you write one byte to cache on a PIV, you
>are going to dump 128 bytes back to memory at some point.  With only 4096 lines
>of cache, it won't be long before that happens...  And there is no way to
>prevent it.

Please stop the nonsense Bob about how processors deal with cache lines.

You have *no idea* how modern processors work with cache lines.

If your model of above here would be true,
your own crafty program would run 2 times faster at modern CPUs.

Where modern starts already somewhere begin 90s, not including Cray processors
of course.

The sad thing is that quite some time ago at CCC, i already wrote how this
works. Yet you can figure it out yourself in the processor manuals as well.

But as long as you don't realize that processors do not write cache lines *just
like that*, because they have a buffer which only writes it when some *other*
cache line gets written, then you will never realize that your cache line will
*never* gets written when one of the processors gets a signal of some kind
(control-c or whatever).

Still giving processor design Bob? If so then for someone who is teaching
processor design you really live in the 70s still...

Best regards,
Vincent

>>
>>And for good reason, think of the frequency at wich data is written (eg just
>>stack frame). Once CPU speed / RAM speed hits 10 or so, write-through cache will
>>cause almost any program to run RAM bound.
>
>Sure, but that wasn't what I was talking about.  Once a line is "dirty" it is
>going back to memory when it is time to replace it.  With just 4K lines of
>cache, they get recycled very quickly.
>
>>
>>>>>  I claimed that for _my_ program,
>>>>>copy/make burned the bus up and getting rid of it made me go 25% faster.
>>>>
>>>>And I suspect this was because of a tiny cache that couldn't even hold the
>>>>heavily used stuff.
>>>
>>>This was on a (originally) pentium pro, with (I believe) 256K of L2 cache.
>>
>>L2 is not a good place to keep your heavily used data.
>
>There's no other choice.  L1 is not big enough for anything.  IE the pentium
>pro had 16K of L1, 8K data, 8K instruction.  Newer pentiums are not much
>better although the 8K instruction has been replaced by the new trace cache
>that holds more than 8KB.  And the data cache is up to 16K.  However, I have
>run personally on xeons with 512K L2, 1024K L2 and 2048K L2 and I didn't see
>any significant difference in performance for my program...  Bigger is slightly
>better in each case, but it was never "big enough".
>
>
>
>
>>
>>>However, I found the _same_ problem on other architectures, such as the Sparc
>>>(super-sparc).  However, I believe it would happen on my 1M L2 cache 700
>>>mhz xeons as well, because my "kernel data" is quite large and anything that
>>>displaces it from cache will hurt.
>>
>>Anything could happen, but is it worth the debugging and the added complexity
>>if you don't even know the hot spots?
>>Or reversed: if under slightly different circumstances the gain would have
>>seemed to be around 0, would you have kept the prepare_undo and unmake code?
>>
>>... Johan
>
>For no gain, I wouldn't have changed, of course...
>
>But there was a significant gain at the time.  I don't think the current
>PIV with 512K L2 is much different from the original pentium pro with 256K L2.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.