Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k

Author: Robert Hyatt

Date: 10:01:24 08/20/04

Go up one level in this thread


On August 20, 2004 at 11:51:27, Tom Kerrigan wrote:

>On August 20, 2004 at 10:51:50, Robert Hyatt wrote:
>
>>On August 20, 2004 at 04:33:07, Tom Kerrigan wrote:
>>
>>>Now that AMD is selling two processors that are identical other than L2 cache
>>>size (Sempron has 256k, Athlon 64 has 512k) we have proof of Crafty's working
>>>set size:
>>>
>>>Sempron:    1,080,020 NPS
>>>Athlon 64:  1,080,230 NPS
>>>
>>>http://www.anandtech.com/linux/showdoc.aspx?i=2170&p=3
>>>
>>>This should prove once and for all that Crafty's working set is < 256k and
>>>therefore that size of L2 cache has no effect on its performance (as long as
>>>it's >= 256k) and that main memory speed likely plays a trivial role
>>>performance-wise.
>>>
>>>I bring this up because of all of the long debates that have occurred in the
>>>past about the value of L2 cache, the speed of memory, and the working set size
>>>of chess programs.
>>>
>>>I have no doubt that Crafty uses a bunch of memory, but obviously not with
>>>enough temporal locality for it to matter one iota.
>>>
>>>-Tom
>>
>>
>>Your interpretation _could_ be seriously flawed.  IE suppose its working set is
>>2mb?  You can't conclude anything if that is true as both the 256K and 512K
>>would be thrashing equally.
>
>How is it they would by thrashing equally? Let's say a cache access takes 5ns
>and main memory takes 50ns. Average access times for 2MB working set:

Simple.  Take an array of 1.9 megabytes, plus the code to loop over the array
sequentially, over and over.

your 256kb cache will be just as fast as the 512K cache as you are getting no
data reuse at all.  But once you hit 2mb, it all fits in cache and runs much
faster...


>
>256k cache: (256k/2MB)*5ns + (2MB-256k/2MB)*50ns = 44.37ns
>512k cache: (512k/2MB)*5ns + (2MB-512k/2MB)*50ns = 38.75ns
>
>That's 15% faster. You'd think a difference that big would show up in the
>benchmark score but it doesn't. Or are you going to claim that Crafty always
>uses memory that it hasn't used for the last ~512k?


I am not going to claim anything other than that your claim is flawed, because
there is a simple example of where it fails given above.




>
>>Only _real_ way is to test with larger sizes as well.  I've done that up to 2mb
>>and saw improvement from 512 to 1024K and from 1024K to 2048K, on older xeons.
>
>Sure. And if other people run those tests and don't see a difference, you'd say
>a chip needed a 4MB/8MB cache for anybody to be sure.


That would be a correct assumption unless you _know_ what the actual working set
size of the program really is.  And if you are certain that random hash probes
and other table lookups are not aliasing to often-used code and blowing it out
of cache frequently.  On the xeons, I saw improvement from 512 to 1024 to 2048.
But that was at least 3-4 years ago when I benchmarked Crafty before buying my
first quad xeon box.  Whether I have changed that or not is anybody's guess as
the program has seen marked changes, from move generation (no more
COMPACT_ATTACKS) to less shared data for NUMA boxes.

But irregardless, I would hardly run on 256K and 512K and conclude that the
working set is < 512K just because 512K was no faster.  That's a basic flawed
conclusion.





>
>I have easy access to 2GHz Athlon 64s with 512k and 1MB cache... if somebody can
>point me to a Windows Crafty executable and tell me what to type, I'll happily
>run the test.
>
>-Tom


Best way to compare is to simply start the crafty executable in a directory with
nothing else in it, and type "bench".  That runs with a small hash table and
everything else, and will report the NPS after running 6 test positions...




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.