Author: Robert Hyatt
Date: 20:43:30 09/01/03
Go up one level in this thread
On August 29, 2003 at 12:25:23, Vincent Diepeveen wrote: >On August 28, 2003 at 11:45:35, Robert Hyatt wrote: > >>On August 28, 2003 at 01:50:46, Johan de Koning wrote: >> >>>On August 27, 2003 at 12:25:51, Robert Hyatt wrote: >>> >>>>On August 26, 2003 at 21:12:45, Johan de Koning wrote: >>> >>>[snip] >>> >>>It seems we're finally back to where you and me started off. >>> >>>>>You keep saying that copy/make causes problems with cach to memory traffic. >>>>>Here I was just saying it doesn't, if cache is plenty. >>>> >>>>Here is the problem: >>>> >>>>When you write to a line of cache, you _guarantee_ that entire line of cache >>>>is going to be written back to memory. There is absolutely no exceptions to >>>>that. So copying from one cache line to another means that "another line" is >>>>going to generate memory traffic. >>> >>>Here is the solution: write-through caches were abondoned a long time ago. >> >>I'm not talking about write-through. I am talking about write-back. Once >>you modify a line of cache, that line of cache _is_ going to be written back >>to memory. When is hard to predict, but before it is replaced by another cache >>line, it _will_ be written back. So you write one byte to cache on a PIV, you >>are going to dump 128 bytes back to memory at some point. With only 4096 lines >>of cache, it won't be long before that happens... And there is no way to >>prevent it. > >Please stop the nonsense Bob about how processors deal with cache lines. > >You have *no idea* how modern processors work with cache lines. RIght. This from the person that _makes it up as he goes_... > >If your model of above here would be true, >your own crafty program would run 2 times faster at modern CPUs. And it does prior to the 128 byte line PIV... Your point would be? Oh, as usual, you don't _have_ a point... > >Where modern starts already somewhere begin 90s, not including Cray processors >of course. > >The sad thing is that quite some time ago at CCC, i already wrote how this >works. Yet you can figure it out yourself in the processor manuals as well. > >But as long as you don't realize that processors do not write cache lines *just >like that*, because they have a buffer which only writes it when some *other* >cache line gets written, then you will never realize that your cache line will >*never* gets written when one of the processors gets a signal of some kind >(control-c or whatever). I've already explained write-back clearly. I don't know what _you_ are talking about, but I know _exactly_ how write-back does. And when you modify a line of cache, you _are_ going to do a memory write later when that cache line gets replaced by something else. If you don't get that, you are beyond help... > >Still giving processor design Bob? If so then for someone who is teaching >processor design you really live in the 70s still... I'd rather live in the 70's than in your world of total information vacuum... > >Best regards, >Vincent > >>> >>>And for good reason, think of the frequency at wich data is written (eg just >>>stack frame). Once CPU speed / RAM speed hits 10 or so, write-through cache will >>>cause almost any program to run RAM bound. >> >>Sure, but that wasn't what I was talking about. Once a line is "dirty" it is >>going back to memory when it is time to replace it. With just 4K lines of >>cache, they get recycled very quickly. >> >>> >>>>>> I claimed that for _my_ program, >>>>>>copy/make burned the bus up and getting rid of it made me go 25% faster. >>>>> >>>>>And I suspect this was because of a tiny cache that couldn't even hold the >>>>>heavily used stuff. >>>> >>>>This was on a (originally) pentium pro, with (I believe) 256K of L2 cache. >>> >>>L2 is not a good place to keep your heavily used data. >> >>There's no other choice. L1 is not big enough for anything. IE the pentium >>pro had 16K of L1, 8K data, 8K instruction. Newer pentiums are not much >>better although the 8K instruction has been replaced by the new trace cache >>that holds more than 8KB. And the data cache is up to 16K. However, I have >>run personally on xeons with 512K L2, 1024K L2 and 2048K L2 and I didn't see >>any significant difference in performance for my program... Bigger is slightly >>better in each case, but it was never "big enough". >> >> >> >> >>> >>>>However, I found the _same_ problem on other architectures, such as the Sparc >>>>(super-sparc). However, I believe it would happen on my 1M L2 cache 700 >>>>mhz xeons as well, because my "kernel data" is quite large and anything that >>>>displaces it from cache will hurt. >>> >>>Anything could happen, but is it worth the debugging and the added complexity >>>if you don't even know the hot spots? >>>Or reversed: if under slightly different circumstances the gain would have >>>seemed to be around 0, would you have kept the prepare_undo and unmake code? >>> >>>... Johan >> >>For no gain, I wouldn't have changed, of course... >> >>But there was a significant gain at the time. I don't think the current >>PIV with 512K L2 is much different from the original pentium pro with 256K L2.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.