Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Is CPU L3 cache important for chess programs ?

Author: Vincent Diepeveen

Date: 13:06:57 11/23/05

Go up one level in this thread


On November 21, 2005 at 03:55:19, Yar wrote:

>Hi !
>
>Is CPU L3 cache important for chess programs ?
>
>Thank you
>
>Yar

The L1 is most important cache. A big L1 cache is what chessprograms need.
A 128KB L1 (64KB data cache) will most definitely outperform a 8KB L1 data
cache.

Examples:

Suppose with a 128KB L1 cache you can get a 95% hit ratio. That means only 5%
has to come from L2 cache.

A64 at 2Ghz generates 2 billion clockticks a second. Or 2 billion cycles.
Getting something out of L1 eats 3 cycles and you can do 2 reads simultaneously
from L1. Compare with P4 that has 8KB L1 cache or so (newer ones double size)
and it eats up to 4 cycles at prescott to get data out of L1 data cache. And it
can read only 1 item at a time.

So if you get for example 95% hitratio in A64 and 85% in P4, then that means,
that P4 has to work harder.

L2 cache then catches basically majority of all other reads. Usually it's a
slower cache too. At A64 or opteron it's just 13 cycles to read something from
L2 cache. You can do many reads simultaneously. P4 is a lot slower. Each P4
has a different L2 speed. I forgot exact speeds. 30 cycles or so for prescott to
read something out of L2?

So you will realize already that it is very important to not have things in L2.
Now Itanium2 has only 256KB L2 usually and A64 has 1024KB.

So it basically Itanium2 *needs* a L3 cache usually.

Chessprograms fit either within L1 or they fit within 512KB fine.

Just hashtable probes get done outside of the caches usually and go straight to
memory controller. So having an on die memory controller is very helpful too,
because itanium2 needs 280 nanoseconds to serve you, versus A64 can deliver
within 91 nanoseconds (that's 2.2Ghz A64 by the way, faster for faster ones).
opteron with a big buffer and ecc + registered memory is serving at
around 111 nanoseconds. Which at 2.0Ghz == 222 cycles. That's TLB trashing.

So you'll realize that a L3 cache is just useless for chess software unless your
L2 is real tiny. It's too slow to get data from that cache anyway and
chessprograms are simply not that big. With real ugly tiny L2 caches of course a
big L3 won't hurt. Which is the case with intel. So it's useful for them because
in many testsets for processors a big L3 helps a lot. It doesn't in chess
software.

It's especially useful when multiprocessing. Scaling at itanium2 thanks to the
huge combined L3 cache sizes is simply 4.0 for diep at 4 processor itanium 2.

Versus 3.92 - 3.93 for dual core dual opteron respectively quad opteron single
core.

As you can see from benchmarks done by Johan de Gelas with Diep at at the time
www.aceshardware.com (nowadays he's at anandtech.com) you can see that a bigger
L3 cache didn't matter anything for diep. Perhaps it matters some far behind the
dot, but he couldn't measure it.

Vincent




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.