Author: Robert Hyatt
Date: 14:29:32 08/21/04
Go up one level in this thread
On August 21, 2004 at 08:41:27, Vincent Diepeveen wrote: >On August 20, 2004 at 11:51:27, Tom Kerrigan wrote: > >>On August 20, 2004 at 10:51:50, Robert Hyatt wrote: >> >>>On August 20, 2004 at 04:33:07, Tom Kerrigan wrote: >>> >>>>Now that AMD is selling two processors that are identical other than L2 cache >>>>size (Sempron has 256k, Athlon 64 has 512k) we have proof of Crafty's working >>>>set size: >>>> >>>>Sempron: 1,080,020 NPS >>>>Athlon 64: 1,080,230 NPS > >Did you test in 32 bits mode or so? > >In 64 bits mode the instruction sizes are bigger (though slightly), but you need >less instructions, so the stress is less on the processor then and more on the >L2/ main memory. > >>>>http://www.anandtech.com/linux/showdoc.aspx?i=2170&p=3 >>>> >>>>This should prove once and for all that Crafty's working set is < 256k and >>>>therefore that size of L2 cache has no effect on its performance (as long as >>>>it's >= 256k) and that main memory speed likely plays a trivial role >>>>performance-wise. >>>> >>>>I bring this up because of all of the long debates that have occurred in the >>>>past about the value of L2 cache, the speed of memory, and the working set size >>>>of chess programs. >>>> >>>>I have no doubt that Crafty uses a bunch of memory, but obviously not with >>>>enough temporal locality for it to matter one iota. >>>> >>>>-Tom >>> >>> >>>Your interpretation _could_ be seriously flawed. IE suppose its working set is >>>2mb? You can't conclude anything if that is true as both the 256K and 512K >>>would be thrashing equally. >> >>How is it they would by thrashing equally? Let's say a cache access takes 5ns >>and main memory takes 50ns. Average access times for 2MB working set: >>256k cache: (256k/2MB)*5ns + (2MB-256k/2MB)*50ns = 44.37ns > >A number of things Tom. > >a1) First of all you need to prove that the L2 caches from both cpu's is giving >data in the same number of cycles > >a2) what type of memory is used with the processors? If 256KB processor has >400Mhz CL2 memory and the other one has 266Mhz CL3 memory then the comparision >would look odd. > >b) You forgot to add the L1 cache > 128 + 256 = 384KB cache > >I assume your claim is working set size < 384 KB > >c) how big did you set the hashtable size to? Some specint2000 comparision where >only 2 MB or something similar gets used is not real interesting. We want 400MB >memory or so. > >d) which version of crafty did you use? Nowadays versions do only sequential >lookups in 1 table and old specint one is doing 2 lookups in 2 different tables. > >e) when using a big hashtable the fastest way to get hashentries is around 91 ns >that assumes 400Mhz memory and CL2 memory. For example dual opterons it is hard >to get under 133 ns. > >f) you really should do 64 bits tests, we've had enough 32 bits tests already. > >This all doesn't take away that i agree with the conclusion that for a >chessprogram when running SINGLE cpu it doesn't matter whether the L2 cache is >256 or 512 or 2048 KB. The real important things are the L1 cache size, the L2 >cache random lookup SPEED and how fast the main memory can randomly access >hashtables. > >>512k cache: (512k/2MB)*5ns + (2MB-512k/2MB)*50ns = 38.75ns > >>That's 15% faster. You'd think a difference that big would show up in the >>benchmark score but it doesn't. Or are you going to claim that Crafty always >>uses memory that it hasn't used for the last ~512k? > >I agree with the math when the L2 cache speed is the same speed. However you >must *prove* those first. There is big differences even from processor line to >processor line. > >I remember that northwood P4 was a lot faster for DIEP than the generation P4 >before and yet everyone is doing as if it is the same processor even. For diep >there was a 20% difference between P3 and P2 speed, for fritz it didn't matter >anything. And so on. > >>>Only _real_ way is to test with larger sizes as well. I've done that up to 2mb >>>and saw improvement from 512 to 1024K and from 1024K to 2048K, on older xeons. >> >>Sure. And if other people run those tests and don't see a difference, you'd say >>a chip needed a 4MB/8MB cache for anybody to be sure. > >You are correct here. That's how Bob works. Nah. That is how _you_ work. Give one example and claim it 'proofs' your point. One example doesn't prove something. It can disprove a claim easily enough however. You've been burned enough times that way to understand... > >>I have easy access to 2GHz Athlon 64s with 512k and 1MB cache... if somebody can >>point me to a Windows Crafty executable and tell me what to type, I'll happily >>run the test. > >>-Tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.