Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Full circle

Author: Robert Hyatt

Date: 20:35:51 09/01/03

Go up one level in this thread


On August 30, 2003 at 10:36:53, Vincent Diepeveen wrote:

>On August 30, 2003 at 02:58:53, Johan de Koning wrote:
>
>>On August 29, 2003 at 08:53:50, Robert Hyatt wrote:
>>
>>>On August 29, 2003 at 02:34:42, Johan de Koning wrote:
>>>
>>>>On August 28, 2003 at 11:45:35, Robert Hyatt wrote:
>>>>
>>>>>On August 28, 2003 at 01:50:46, Johan de Koning wrote:
>>>>>
>>>>>>On August 27, 2003 at 12:25:51, Robert Hyatt wrote:
>>>>>>
>>>>>>>On August 26, 2003 at 21:12:45, Johan de Koning wrote:
>>>>>>
>>>>>>[snip]
>>>>>>
>>>>>>It seems we're finally back to where you and me started off.
>>>>>>
>>>>>>>>You keep saying that copy/make causes problems with cach to memory traffic.
>>>>>>>>Here I was just saying it doesn't, if cache is plenty.
>>>>>>>
>>>>>>>Here is the problem:
>>>>>>>
>>>>>>>When you write to a line of cache, you _guarantee_ that entire line of cache
>>>>>>>is going to be written back to memory.  There is absolutely no exceptions to
>>>>>>>that.  So copying from one cache line to another means that "another line" is
>>>>>>>going to generate memory traffic.
>>>>>>
>>>>>>Here is the solution: write-through caches were abondoned a long time ago.
>>>>>
>>>>>I'm not talking about write-through.
>>>>
>>>>I'm glad you aren't. :-)
>>>>
>>>>>  I am talking about write-back.  Once
>>>>>you modify a line of cache, that line of cache _is_ going to be written back
>>>>>to memory.  When is hard to predict, but before it is replaced by another cache
>>>>>line, it _will_ be written back.  So you write one byte to cache on a PIV, you
>>>>>are going to dump 128 bytes back to memory at some point.  With only 4096 lines
>>>>>of cache, it won't be long before that happens...  And there is no way to
>>>>>prevent it.
>>>>
>>>>Sure, every dirty cache line will be written back at *some* point. But you're
>>>>allowed to use or update it a million times before it is flushed only once.
>>>>Number of cache lines has nothing to do with it. On a lean and empty system
>>>>some lines might even survive until after program termination.
>>>
>>>Number of cache lines has everything to do with it.  If you can keep 4K
>>>chunks of a program in memory, and the program is _way_ beyond 4K chunks
>>>in size of the "working set", then cache is going to thrash pretty badly.
>>
>>Working set, by whatever definition, is not relevant.
>>Frequency distribution is.
>>Since *that* is the basis of caching.
>>(And analogously of compression.)
>>
>>>I've already reported that I've tested on 512K, 1024K and 2048K processors,
>>>and that I have seen an improvement every time L2 gets bigger.
>>
>>Yes, reported as not significant. But off-topic since your large tables are
>>addressed irregularly, hence never threaten hot data in L1.
>>
>>>As I said initially, my comments were _directly_ related to Crafty.  Not to
>>>other mythical programs nor mythical processor architectures.  But for Crafty,
>>>copy/make was slower on an architecture that is _very_ close to the PIV of
>>>today, albiet with 1/2 the L2 cache, and a much shorter pipeline.
>>
>>As I said initially, writing to cache (ie just writing) does not relate to
>>memory traffic. That was the issue for the last 10 days.
>>
>>I'm not challenging your results with Crafty at all, I'm only doubting them.
>>And I'd still like to see a copy simulation, preferrably on different machines,
>>to put things in at least some perspective.
>
>A few years ago when Hyatt was asked why he chose for Xeons PII/PIII with less
>L2 cache than possible in his quad xeon, he answered there was 0% difference in
>performance for crafty :)

I said "there was very little improvement".  I haven't changed that one bit.

I have 512K, 1M and 2M xeons here.  A couple of percent faster for each
jump.  Compare the prices to see why I said "don't buy the 1M/2M processors."

It's pretty clear.

It's also pretty clear that Crafty is very L2-unfriendly.

It's not hard to run two programs, and use the MSR counters to measure
cache line misses for each program.

>
>>
>>
>>[...]
>>>Of course X86 is crippled for many more reasons than that.  8 registers for
>>>starters.  :)
>>
>>Well, here is *something* we agree on.
>>
>>... Johan



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.