Author: Robert Hyatt
Date: 08:45:35 08/28/03
Go up one level in this thread
On August 28, 2003 at 01:50:46, Johan de Koning wrote: >On August 27, 2003 at 12:25:51, Robert Hyatt wrote: > >>On August 26, 2003 at 21:12:45, Johan de Koning wrote: > >[snip] > >It seems we're finally back to where you and me started off. > >>>You keep saying that copy/make causes problems with cach to memory traffic. >>>Here I was just saying it doesn't, if cache is plenty. >> >>Here is the problem: >> >>When you write to a line of cache, you _guarantee_ that entire line of cache >>is going to be written back to memory. There is absolutely no exceptions to >>that. So copying from one cache line to another means that "another line" is >>going to generate memory traffic. > >Here is the solution: write-through caches were abondoned a long time ago. I'm not talking about write-through. I am talking about write-back. Once you modify a line of cache, that line of cache _is_ going to be written back to memory. When is hard to predict, but before it is replaced by another cache line, it _will_ be written back. So you write one byte to cache on a PIV, you are going to dump 128 bytes back to memory at some point. With only 4096 lines of cache, it won't be long before that happens... And there is no way to prevent it. > >And for good reason, think of the frequency at wich data is written (eg just >stack frame). Once CPU speed / RAM speed hits 10 or so, write-through cache will >cause almost any program to run RAM bound. Sure, but that wasn't what I was talking about. Once a line is "dirty" it is going back to memory when it is time to replace it. With just 4K lines of cache, they get recycled very quickly. > >>>> I claimed that for _my_ program, >>>>copy/make burned the bus up and getting rid of it made me go 25% faster. >>> >>>And I suspect this was because of a tiny cache that couldn't even hold the >>>heavily used stuff. >> >>This was on a (originally) pentium pro, with (I believe) 256K of L2 cache. > >L2 is not a good place to keep your heavily used data. There's no other choice. L1 is not big enough for anything. IE the pentium pro had 16K of L1, 8K data, 8K instruction. Newer pentiums are not much better although the 8K instruction has been replaced by the new trace cache that holds more than 8KB. And the data cache is up to 16K. However, I have run personally on xeons with 512K L2, 1024K L2 and 2048K L2 and I didn't see any significant difference in performance for my program... Bigger is slightly better in each case, but it was never "big enough". > >>However, I found the _same_ problem on other architectures, such as the Sparc >>(super-sparc). However, I believe it would happen on my 1M L2 cache 700 >>mhz xeons as well, because my "kernel data" is quite large and anything that >>displaces it from cache will hurt. > >Anything could happen, but is it worth the debugging and the added complexity >if you don't even know the hot spots? >Or reversed: if under slightly different circumstances the gain would have >seemed to be around 0, would you have kept the prepare_undo and unmake code? > >... Johan For no gain, I wouldn't have changed, of course... But there was a significant gain at the time. I don't think the current PIV with 512K L2 is much different from the original pentium pro with 256K L2.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.