Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Full circle

Author: Vincent Diepeveen

Date: 09:46:04 08/29/03

Go up one level in this thread


On August 29, 2003 at 08:53:50, Robert Hyatt wrote:

>On August 29, 2003 at 02:34:42, Johan de Koning wrote:
>
>>On August 28, 2003 at 11:45:35, Robert Hyatt wrote:
>>
>>>On August 28, 2003 at 01:50:46, Johan de Koning wrote:
>>>
>>>>On August 27, 2003 at 12:25:51, Robert Hyatt wrote:
>>>>
>>>>>On August 26, 2003 at 21:12:45, Johan de Koning wrote:
>>>>
>>>>[snip]
>>>>
>>>>It seems we're finally back to where you and me started off.
>>>>
>>>>>>You keep saying that copy/make causes problems with cach to memory traffic.
>>>>>>Here I was just saying it doesn't, if cache is plenty.
>>>>>
>>>>>Here is the problem:
>>>>>
>>>>>When you write to a line of cache, you _guarantee_ that entire line of cache
>>>>>is going to be written back to memory.  There is absolutely no exceptions to
>>>>>that.  So copying from one cache line to another means that "another line" is
>>>>>going to generate memory traffic.
>>>>
>>>>Here is the solution: write-through caches were abondoned a long time ago.
>>>
>>>I'm not talking about write-through.
>>
>>I'm glad you aren't. :-)
>>
>>>  I am talking about write-back.  Once
>>>you modify a line of cache, that line of cache _is_ going to be written back
>>>to memory.  When is hard to predict, but before it is replaced by another cache
>>>line, it _will_ be written back.  So you write one byte to cache on a PIV, you
>>>are going to dump 128 bytes back to memory at some point.  With only 4096 lines
>>>of cache, it won't be long before that happens...  And there is no way to
>>>prevent it.
>>
>>Sure, every dirty cache line will be written back at *some* point. But you're
>>allowed to use or update it a million times before it is flushed only once.
>>Number of cache lines has nothing to do with it. On a lean and empty system
>>some lines might even survive until after program termination.
>
>Number of cache lines has everything to do with it.  If you can keep 4K
>chunks of a program in memory, and the program is _way_ beyond 4K chunks
>in size of the "working set", then cache is going to thrash pretty badly.
>I've already reported that I've tested on 512K, 1024K and 2048K processors,
>and that I have seen an improvement every time L2 gets bigger.
>
>As I said initially, my comments were _directly_ related to Crafty.  Not to
>other mythical programs nor mythical processor architectures.  But for Crafty,
>copy/make was slower on an architecture that is _very_ close to the PIV of
>today, albiet with 1/2 the L2 cache, and a much shorter pipeline.
>
>
>>
>>>>And for good reason, think of the frequency at wich data is written (eg just
>>>>stack frame). Once CPU speed / RAM speed hits 10 or so, write-through cache will
>>>>cause almost any program to run RAM bound.
>>>
>>>Sure, but that wasn't what I was talking about.  Once a line is "dirty" it is
>>>going back to memory when it is time to replace it.  With just 4K lines of
>>>cache, they get recycled very quickly.
>>>
>>>>
>>>>>>>  I claimed that for _my_ program,
>>>>>>>copy/make burned the bus up and getting rid of it made me go 25% faster.
>>>>>>
>>>>>>And I suspect this was because of a tiny cache that couldn't even hold the
>>>>>>heavily used stuff.
>>>>>
>>>>>This was on a (originally) pentium pro, with (I believe) 256K of L2 cache.
>>>>
>>>>L2 is not a good place to keep your heavily used data.
>>>
>>>There's no other choice.  L1 is not big enough for anything.
>>
>>It's big enough to hold your position and top of stack. It's even big enough to
>>hold *my* position of 22000 bytes, except for the rarely addressed parts.
>>
>
>It isn't big enough to hold even the stuff I need to generate moves.  I have
>multiple arrays of 64 X 256 X 8bytes, that I use repeatedly.  One of those
>is enough to zap L1, although I don't need the entire table at one shot.  But
>I do need parts of four of those, and that is just for starters...
>
>
>
>
>
>>The less heavily used data will live briefly in the LRU lines but is typically
>>not dirty. Though it is certainly possible to get unlucky and flush hot data,
>>depending on memory lay-out and program flow.
>>
>>>  IE the pentium
>>>pro had 16K of L1, 8K data, 8K instruction.  Newer pentiums are not much
>>>better although the 8K instruction has been replaced by the new trace cache
>>>that holds more than 8KB.  And the data cache is up to 16K.  However, I have
>>>run personally on xeons with 512K L2, 1024K L2 and 2048K L2 and I didn't see
>>>any significant difference in performance for my program...  Bigger is slightly
>>>better in each case, but it was never "big enough".
>>
>>I guess most of your tables are pretty sparse in terms of access frequency. So
>>you might get away with 2048 lines of L2. In fact I'm pretty sure you get away
>>with it since a few RAM accesses per node would kill any 1+ MN/s badly.
>>
>>But regarding L1 size: Intel's policy simply sucks. :-)
>
>I wouldn't say it "sucks".  You _can_ get a 2048K L2 cache xeon.  If you can
>afford it.  :)

http://www.intel.com/ebusiness/products/server/processor/xeon_mp/index.htm

But that 2MB L2 cache Xeon, of course ignoring that it is priced about what my
car is worth, it is clocked to only 2.8Ghz.

So that will be blown away by any $60 K7 processor.

Not to mention opteron.

Talking about opteron, when are you going to buy that 64 bits cpu?

For 10 years you have been crying about getting 64 bits. Now there is a 64 bits
cpu and you don't have such a system yet?

IBM E325 looks cool. Quad opteron 2.0Ghz.

>Bigger L1 would be nice, and it will probably happen soon.
>
>Of course X86 is crippled for many more reasons than that.  8 registers for
>starters.  :)

16 registers at x86-64.

Rumours say intel will take till 2005 before you can buy their x86-64 cpu. Are
you going to wait till then or already buy an opteron?

>
>>
>>... Johan



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.