Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Full circle

Author: Robert Hyatt

Date: 20:43:30 09/01/03

Go up one level in this thread


On August 29, 2003 at 12:25:23, Vincent Diepeveen wrote:

>On August 28, 2003 at 11:45:35, Robert Hyatt wrote:
>
>>On August 28, 2003 at 01:50:46, Johan de Koning wrote:
>>
>>>On August 27, 2003 at 12:25:51, Robert Hyatt wrote:
>>>
>>>>On August 26, 2003 at 21:12:45, Johan de Koning wrote:
>>>
>>>[snip]
>>>
>>>It seems we're finally back to where you and me started off.
>>>
>>>>>You keep saying that copy/make causes problems with cach to memory traffic.
>>>>>Here I was just saying it doesn't, if cache is plenty.
>>>>
>>>>Here is the problem:
>>>>
>>>>When you write to a line of cache, you _guarantee_ that entire line of cache
>>>>is going to be written back to memory.  There is absolutely no exceptions to
>>>>that.  So copying from one cache line to another means that "another line" is
>>>>going to generate memory traffic.
>>>
>>>Here is the solution: write-through caches were abondoned a long time ago.
>>
>>I'm not talking about write-through.  I am talking about write-back.  Once
>>you modify a line of cache, that line of cache _is_ going to be written back
>>to memory.  When is hard to predict, but before it is replaced by another cache
>>line, it _will_ be written back.  So you write one byte to cache on a PIV, you
>>are going to dump 128 bytes back to memory at some point.  With only 4096 lines
>>of cache, it won't be long before that happens...  And there is no way to
>>prevent it.
>
>Please stop the nonsense Bob about how processors deal with cache lines.
>
>You have *no idea* how modern processors work with cache lines.

RIght.  This from the person that _makes it up as he goes_...


>
>If your model of above here would be true,
>your own crafty program would run 2 times faster at modern CPUs.

And it does prior to the 128 byte line PIV...  Your point would be?

Oh, as usual, you don't _have_ a point...


>
>Where modern starts already somewhere begin 90s, not including Cray processors
>of course.
>
>The sad thing is that quite some time ago at CCC, i already wrote how this
>works. Yet you can figure it out yourself in the processor manuals as well.
>
>But as long as you don't realize that processors do not write cache lines *just
>like that*, because they have a buffer which only writes it when some *other*
>cache line gets written, then you will never realize that your cache line will
>*never* gets written when one of the processors gets a signal of some kind
>(control-c or whatever).


I've already explained write-back clearly.  I don't know what _you_ are talking
about, but I know _exactly_ how write-back does.  And when you modify a line
of cache, you _are_ going to do a memory write later when that cache line
gets replaced by something else.

If you don't get that, you are beyond help...

>
>Still giving processor design Bob? If so then for someone who is teaching
>processor design you really live in the 70s still...

I'd rather live in the 70's than in your world of total information
vacuum...





>
>Best regards,
>Vincent
>
>>>
>>>And for good reason, think of the frequency at wich data is written (eg just
>>>stack frame). Once CPU speed / RAM speed hits 10 or so, write-through cache will
>>>cause almost any program to run RAM bound.
>>
>>Sure, but that wasn't what I was talking about.  Once a line is "dirty" it is
>>going back to memory when it is time to replace it.  With just 4K lines of
>>cache, they get recycled very quickly.
>>
>>>
>>>>>>  I claimed that for _my_ program,
>>>>>>copy/make burned the bus up and getting rid of it made me go 25% faster.
>>>>>
>>>>>And I suspect this was because of a tiny cache that couldn't even hold the
>>>>>heavily used stuff.
>>>>
>>>>This was on a (originally) pentium pro, with (I believe) 256K of L2 cache.
>>>
>>>L2 is not a good place to keep your heavily used data.
>>
>>There's no other choice.  L1 is not big enough for anything.  IE the pentium
>>pro had 16K of L1, 8K data, 8K instruction.  Newer pentiums are not much
>>better although the 8K instruction has been replaced by the new trace cache
>>that holds more than 8KB.  And the data cache is up to 16K.  However, I have
>>run personally on xeons with 512K L2, 1024K L2 and 2048K L2 and I didn't see
>>any significant difference in performance for my program...  Bigger is slightly
>>better in each case, but it was never "big enough".
>>
>>
>>
>>
>>>
>>>>However, I found the _same_ problem on other architectures, such as the Sparc
>>>>(super-sparc).  However, I believe it would happen on my 1M L2 cache 700
>>>>mhz xeons as well, because my "kernel data" is quite large and anything that
>>>>displaces it from cache will hurt.
>>>
>>>Anything could happen, but is it worth the debugging and the added complexity
>>>if you don't even know the hot spots?
>>>Or reversed: if under slightly different circumstances the gain would have
>>>seemed to be around 0, would you have kept the prepare_undo and unmake code?
>>>
>>>... Johan
>>
>>For no gain, I wouldn't have changed, of course...
>>
>>But there was a significant gain at the time.  I don't think the current
>>PIV with 512K L2 is much different from the original pentium pro with 256K L2.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.