Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: working set is < 384k

Author: Vincent Diepeveen

Date: 09:48:12 08/22/04

Go up one level in this thread


On August 21, 2004 at 17:29:32, Robert Hyatt wrote:

>On August 21, 2004 at 08:41:27, Vincent Diepeveen wrote:
>
>>On August 20, 2004 at 11:51:27, Tom Kerrigan wrote:
>>
>>>On August 20, 2004 at 10:51:50, Robert Hyatt wrote:
>>>
>>>>On August 20, 2004 at 04:33:07, Tom Kerrigan wrote:
>>>>
>>>>>Now that AMD is selling two processors that are identical other than L2 cache
>>>>>size (Sempron has 256k, Athlon 64 has 512k) we have proof of Crafty's working
>>>>>set size:
>>>>>
>>>>>Sempron:    1,080,020 NPS
>>>>>Athlon 64:  1,080,230 NPS
>>
>>Did you test in 32 bits mode or so?
>>
>>In 64 bits mode the instruction sizes are bigger (though slightly), but you need
>>less instructions, so the stress is less on the processor then and more on the
>>L2/ main memory.
>>
>>>>>http://www.anandtech.com/linux/showdoc.aspx?i=2170&p=3
>>>>>
>>>>>This should prove once and for all that Crafty's working set is < 256k and
>>>>>therefore that size of L2 cache has no effect on its performance (as long as
>>>>>it's >= 256k) and that main memory speed likely plays a trivial role
>>>>>performance-wise.
>>>>>
>>>>>I bring this up because of all of the long debates that have occurred in the
>>>>>past about the value of L2 cache, the speed of memory, and the working set size
>>>>>of chess programs.
>>>>>
>>>>>I have no doubt that Crafty uses a bunch of memory, but obviously not with
>>>>>enough temporal locality for it to matter one iota.
>>>>>
>>>>>-Tom
>>>>
>>>>
>>>>Your interpretation _could_ be seriously flawed.  IE suppose its working set is
>>>>2mb?  You can't conclude anything if that is true as both the 256K and 512K
>>>>would be thrashing equally.
>>>
>>>How is it they would by thrashing equally? Let's say a cache access takes 5ns
>>>and main memory takes 50ns. Average access times for 2MB working set:
>>>256k cache: (256k/2MB)*5ns + (2MB-256k/2MB)*50ns = 44.37ns
>>
>>A number of things Tom.
>>
>>a1) First of all you need to prove that the L2 caches from both cpu's is giving
>>data in the same number of cycles
>>
>>a2) what type of memory is used with the processors? If 256KB processor has
>>400Mhz CL2 memory and the other one has 266Mhz CL3 memory then the comparision
>>would look odd.
>>
>>b) You forgot to add the L1 cache
>>  128 + 256 = 384KB cache
>>
>>I assume your claim is working set size < 384 KB
>>
>>c) how big did you set the hashtable size to? Some specint2000 comparision where
>>only 2 MB or something similar gets used is not real interesting. We want 400MB
>>memory or so.
>>
>>d) which version of crafty did you use? Nowadays versions do only sequential
>>lookups in 1 table and old specint one is doing 2 lookups in 2 different tables.
>>
>>e) when using a big hashtable the fastest way to get hashentries is around 91 ns
>>that assumes 400Mhz memory and CL2 memory. For example dual opterons it is hard
>>to get under 133 ns.
>>
>>f) you really should do 64 bits tests, we've had enough 32 bits tests already.
>>
>>This all doesn't take away that i agree with the conclusion that for a
>>chessprogram when running SINGLE cpu it doesn't matter whether the L2 cache is
>>256 or 512 or 2048 KB. The real important things are the L1 cache size, the L2
>>cache random lookup SPEED and how fast the main memory can randomly access
>>hashtables.
>>
>>>512k cache: (512k/2MB)*5ns + (2MB-512k/2MB)*50ns = 38.75ns
>>
>>>That's 15% faster. You'd think a difference that big would show up in the
>>>benchmark score but it doesn't. Or are you going to claim that Crafty always
>>>uses memory that it hasn't used for the last ~512k?
>>
>>I agree with the math when the L2 cache speed is the same speed. However you
>>must *prove* those first. There is big differences even from processor line to
>>processor line.
>>
>>I remember that northwood P4 was a lot faster for DIEP than the generation P4
>>before and yet everyone is doing as if it is the same processor even. For diep
>>there was a 20% difference between P3 and P2 speed, for fritz it didn't matter
>>anything. And so on.
>>
>>>>Only _real_ way is to test with larger sizes as well.  I've done that up to 2mb
>>>>and saw improvement from 512 to 1024K and from 1024K to 2048K, on older xeons.
>>>
>>>Sure. And if other people run those tests and don't see a difference, you'd say
>>>a chip needed a 4MB/8MB cache for anybody to be sure.
>>
>>You are correct here. That's how Bob works.
>
>Nah.  That is how _you_ work.  Give one example and claim it 'proofs' your
>point.  One example doesn't prove something.  It can disprove a claim easily
>enough however.  You've been burned enough times that way to understand...

Try to disprove that crafties working set size is under 384KB.

>
>
>>
>>>I have easy access to 2GHz Athlon 64s with 512k and 1MB cache... if somebody can
>>>point me to a Windows Crafty executable and tell me what to type, I'll happily
>>>run the test.
>>
>>>-Tom



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.