Author: Vincent Diepeveen
Date: 13:06:57 11/23/05
Go up one level in this thread
On November 21, 2005 at 03:55:19, Yar wrote: >Hi ! > >Is CPU L3 cache important for chess programs ? > >Thank you > >Yar The L1 is most important cache. A big L1 cache is what chessprograms need. A 128KB L1 (64KB data cache) will most definitely outperform a 8KB L1 data cache. Examples: Suppose with a 128KB L1 cache you can get a 95% hit ratio. That means only 5% has to come from L2 cache. A64 at 2Ghz generates 2 billion clockticks a second. Or 2 billion cycles. Getting something out of L1 eats 3 cycles and you can do 2 reads simultaneously from L1. Compare with P4 that has 8KB L1 cache or so (newer ones double size) and it eats up to 4 cycles at prescott to get data out of L1 data cache. And it can read only 1 item at a time. So if you get for example 95% hitratio in A64 and 85% in P4, then that means, that P4 has to work harder. L2 cache then catches basically majority of all other reads. Usually it's a slower cache too. At A64 or opteron it's just 13 cycles to read something from L2 cache. You can do many reads simultaneously. P4 is a lot slower. Each P4 has a different L2 speed. I forgot exact speeds. 30 cycles or so for prescott to read something out of L2? So you will realize already that it is very important to not have things in L2. Now Itanium2 has only 256KB L2 usually and A64 has 1024KB. So it basically Itanium2 *needs* a L3 cache usually. Chessprograms fit either within L1 or they fit within 512KB fine. Just hashtable probes get done outside of the caches usually and go straight to memory controller. So having an on die memory controller is very helpful too, because itanium2 needs 280 nanoseconds to serve you, versus A64 can deliver within 91 nanoseconds (that's 2.2Ghz A64 by the way, faster for faster ones). opteron with a big buffer and ecc + registered memory is serving at around 111 nanoseconds. Which at 2.0Ghz == 222 cycles. That's TLB trashing. So you'll realize that a L3 cache is just useless for chess software unless your L2 is real tiny. It's too slow to get data from that cache anyway and chessprograms are simply not that big. With real ugly tiny L2 caches of course a big L3 won't hurt. Which is the case with intel. So it's useful for them because in many testsets for processors a big L3 helps a lot. It doesn't in chess software. It's especially useful when multiprocessing. Scaling at itanium2 thanks to the huge combined L3 cache sizes is simply 4.0 for diep at 4 processor itanium 2. Versus 3.92 - 3.93 for dual core dual opteron respectively quad opteron single core. As you can see from benchmarks done by Johan de Gelas with Diep at at the time www.aceshardware.com (nowadays he's at anandtech.com) you can see that a bigger L3 cache didn't matter anything for diep. Perhaps it matters some far behind the dot, but he couldn't measure it. Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.