Author: Vincent Diepeveen
Date: 09:48:12 08/22/04
Go up one level in this thread
On August 21, 2004 at 17:29:32, Robert Hyatt wrote: >On August 21, 2004 at 08:41:27, Vincent Diepeveen wrote: > >>On August 20, 2004 at 11:51:27, Tom Kerrigan wrote: >> >>>On August 20, 2004 at 10:51:50, Robert Hyatt wrote: >>> >>>>On August 20, 2004 at 04:33:07, Tom Kerrigan wrote: >>>> >>>>>Now that AMD is selling two processors that are identical other than L2 cache >>>>>size (Sempron has 256k, Athlon 64 has 512k) we have proof of Crafty's working >>>>>set size: >>>>> >>>>>Sempron: 1,080,020 NPS >>>>>Athlon 64: 1,080,230 NPS >> >>Did you test in 32 bits mode or so? >> >>In 64 bits mode the instruction sizes are bigger (though slightly), but you need >>less instructions, so the stress is less on the processor then and more on the >>L2/ main memory. >> >>>>>http://www.anandtech.com/linux/showdoc.aspx?i=2170&p=3 >>>>> >>>>>This should prove once and for all that Crafty's working set is < 256k and >>>>>therefore that size of L2 cache has no effect on its performance (as long as >>>>>it's >= 256k) and that main memory speed likely plays a trivial role >>>>>performance-wise. >>>>> >>>>>I bring this up because of all of the long debates that have occurred in the >>>>>past about the value of L2 cache, the speed of memory, and the working set size >>>>>of chess programs. >>>>> >>>>>I have no doubt that Crafty uses a bunch of memory, but obviously not with >>>>>enough temporal locality for it to matter one iota. >>>>> >>>>>-Tom >>>> >>>> >>>>Your interpretation _could_ be seriously flawed. IE suppose its working set is >>>>2mb? You can't conclude anything if that is true as both the 256K and 512K >>>>would be thrashing equally. >>> >>>How is it they would by thrashing equally? Let's say a cache access takes 5ns >>>and main memory takes 50ns. Average access times for 2MB working set: >>>256k cache: (256k/2MB)*5ns + (2MB-256k/2MB)*50ns = 44.37ns >> >>A number of things Tom. >> >>a1) First of all you need to prove that the L2 caches from both cpu's is giving >>data in the same number of cycles >> >>a2) what type of memory is used with the processors? If 256KB processor has >>400Mhz CL2 memory and the other one has 266Mhz CL3 memory then the comparision >>would look odd. >> >>b) You forgot to add the L1 cache >> 128 + 256 = 384KB cache >> >>I assume your claim is working set size < 384 KB >> >>c) how big did you set the hashtable size to? Some specint2000 comparision where >>only 2 MB or something similar gets used is not real interesting. We want 400MB >>memory or so. >> >>d) which version of crafty did you use? Nowadays versions do only sequential >>lookups in 1 table and old specint one is doing 2 lookups in 2 different tables. >> >>e) when using a big hashtable the fastest way to get hashentries is around 91 ns >>that assumes 400Mhz memory and CL2 memory. For example dual opterons it is hard >>to get under 133 ns. >> >>f) you really should do 64 bits tests, we've had enough 32 bits tests already. >> >>This all doesn't take away that i agree with the conclusion that for a >>chessprogram when running SINGLE cpu it doesn't matter whether the L2 cache is >>256 or 512 or 2048 KB. The real important things are the L1 cache size, the L2 >>cache random lookup SPEED and how fast the main memory can randomly access >>hashtables. >> >>>512k cache: (512k/2MB)*5ns + (2MB-512k/2MB)*50ns = 38.75ns >> >>>That's 15% faster. You'd think a difference that big would show up in the >>>benchmark score but it doesn't. Or are you going to claim that Crafty always >>>uses memory that it hasn't used for the last ~512k? >> >>I agree with the math when the L2 cache speed is the same speed. However you >>must *prove* those first. There is big differences even from processor line to >>processor line. >> >>I remember that northwood P4 was a lot faster for DIEP than the generation P4 >>before and yet everyone is doing as if it is the same processor even. For diep >>there was a 20% difference between P3 and P2 speed, for fritz it didn't matter >>anything. And so on. >> >>>>Only _real_ way is to test with larger sizes as well. I've done that up to 2mb >>>>and saw improvement from 512 to 1024K and from 1024K to 2048K, on older xeons. >>> >>>Sure. And if other people run those tests and don't see a difference, you'd say >>>a chip needed a 4MB/8MB cache for anybody to be sure. >> >>You are correct here. That's how Bob works. > >Nah. That is how _you_ work. Give one example and claim it 'proofs' your >point. One example doesn't prove something. It can disprove a claim easily >enough however. You've been burned enough times that way to understand... Try to disprove that crafties working set size is under 384KB. > > >> >>>I have easy access to 2GHz Athlon 64s with 512k and 1MB cache... if somebody can >>>point me to a Windows Crafty executable and tell me what to type, I'll happily >>>run the test. >> >>>-Tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.