Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k

Author: Robert Hyatt

Date: 18:28:20 08/20/04

Go up one level in this thread


On August 20, 2004 at 17:52:54, Tom Kerrigan wrote:

>On August 20, 2004 at 17:36:51, Robert Hyatt wrote:
>
>...
>
>>As I said, I don't know.  But clearly testing 256K vs 512K doesn't provide much
>>actual data to draw conclusions from.  Obviously the 2048K chip was not 5x
>
>What is it that you don't know? If a program's working set doesn't fit into
>cache, then adding more cache will always increase performance, assuming a
>completely random access pattern.

Why do you get to make such an assumption?  I +specifically+ try to do lots of
sequential accesses to take advantage of cache line fills that pre-fetch data...


>With chess programs, memory access is not
>random at all, it's obviously biased towards reusing data, which would increase
>performance even more. (Chess programs are obviously not full of loops that just
>read and write 2MB arrays.)

Again, so what?  You run a test with two cache sizes and "prove" (your words)
that the working set is < 256K.  That is a "vincent proof".  And all it takes to
toss out such a proof is one example of where it is wrong.  I gave one.

Without difinitive data, however, all of this is nonsense.  You can't prove
anything with one observation.  You _can_ disprove something since one exception
is enough.  But forget the "proof" stuff.

I gave you my data.  512k-1024K was about 10%, 1024-2048K was another 7%.  Again
these are from memory.  Eugene also gave you some numbers.




>
>The only reason why a chess program's performance wouldn't increase with size of
>L2 cache is because its working set fits into the cache.


Baloney but I won't argue the point further.  I've given one easy to understand
exception to the above.  If it's working set size is _significantly_ bigger than
cache. then doubling L2 might not help.  In fact, look up "Belady's anomaly" for
examples of where doubling the resident set size (same as doubling the cache
size) actually _increases_ paging rate.  So making the above statement "the only
reason ..." is simply _wrong_.



>
>I don't know why this upsets you so much. I know that you think Crafty uses a
>bunch of huge arrays frequently enough and randomly enough to blow out the cache
>but you have no evidence of this, and there is evidence that indicates
>otherwise. If anything, I'd be happy about having a program that runs almost
>entirely in a chip's on-die cache. That means you're immune to the ever-growing
>disparity between MPU and main memory performance.


Doesn't upset me at all.  I simply corrected a false statement (a proof) that
really didn't prove anything.  All I care about is how it plays and how it
behaves on the various hardware platforms I have access to.  I spend exactly
_zero_ time wondering "how big is the cache footprint?" since I don't plan on
doing anything to make it smaller, quite the opposite in fact...

I'd love to live in the on-chip cache too.  But if I do, then why did I get
faster with bigger cache on at least two different platforms (xeon and IA64)???

It isn't easy to compare speeds in fact.  Different memory bus widths.
Different cache line sizes (ie PIII vs PIV vs Opteron).  Different bus speeds.
Different memory speeds.  Different cache speeds.  Different set associativity.
cache line / memory aliasing issues.  The list goes on and on.  One simple test
won't sort all of that out very well.




>
>>More I can't conclude without any way to do testing.  I might look up the cache
>>modeling software and try that to see what it says, for fun...
>
>Why bother? Just pick a big array that you think is accessed frequently and
>randomly and instrument it. Print out which elements are accessed and when and
>you can easily get an idea of whether or not the accesses are hitting cache. (Or
>if it's being accessed so infrequently that it doesn't matter.)
>
>-Tom

That would produce zillions of numbers.  I can't deal with that myself.  I'd
rather have a program measure it more accurately.




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.