Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Importance of L2 cache speed/size for diff programs (was:..Genius speed?

Author: fca

Date: 04:59:04 08/13/98

Go up one level in this thread


On August 12, 1998 at 13:55:01, Tom Kerrigan wrote:

>On August 12, 1998 at 10:44:40, fca wrote:
>
>>You wrote:
>>"Aside from software differences, the Pentium MMX/200 has a 66MHz L2 cache
>>(possibly smaller than 512k) whereas the Pentium II/300 has a 512k 150MHz L2
>>cache. If a program really bangs on the L2 cache, it will go much faster on the
>>Pentium II."
>>But since the core (P2/300 vs P200MMX) is so much faster, the extra/faster cache
>>(even if accessed a lot) simply serves to alleviate what the faster core would
>>*otherwise* have made into a bottleneck.  L2-hit rates etc suggest in itself
>>this would not be able to increase the speed ratio above that the cores deliver.
>
>Sort of. Notice that the 200/66 core clock speed/L2 cache speed ratio is much
>less than the 300/150 ratio, so if L2 cache is a bottleneck on the Pentium
>MMX/200, then it's less of a bottleneck on the Pentium II/300.

But...

1. Comparing cores is not just a matter of comparing MHz, as of course you know.
 The P6 is more efficient (ignoring cacheing considerations) in many subtle
ways.

2. Of course not efficient enough to mitigate a 3:1 vs 2:1 (L2 speed vs core MHz
ratios)... but a poster whose technical viewpoint is one with which I usually
agree stated that:
'The P6 core is designed to be less dependent on the L2 cache, too.'

;-)

I believe 1. & 2. together make my "case" as pasted above by you...

>>So, I am still surprised at the 2.5x reported.  Aren't you?

Actually, thanks to clarification by blass as to what he was running (the effect
of the 16-bit F5 "harness" on J) and some benchmarks kindly posted by Amir, we
are not headed for 2.5x any more.  I view this discussion as more of a "are we
correctly understanding the effect of core changes and L2 changes on chess nps"
one.

>Not really. Consider this:

>Pentium MMX/200 = 1
>Pentium MMX/300 = 1.5 (assume linear scaling)
>Pentium MMX/300 * 1.66 = 2.5 (66% improvement from P5 -> P6 core)

Ah - you are dropping the "L2-hit-hard" theory now?  Perhaps not. :-)

In respect of the above scalings:

As I do not believe there was an MMX300 only a P2/300 (ignoring celeron), I
interpret the above as 2.5 = (300/200 MHz linear scaling) x 1.66 (a claimed P5
-> P6 core change effect).  I realise you wrote it this way to progress from
200MMX to P2/300 in steps.

I disagree with the validity of claiming linear scaling (i.e. MHz dependence)
with the same core-type and L2 size/speed.  It is *simply  not* backed-up by
evidence (hosts of evidence from benchmarks at www,intel.com - .  It assumes no
bottlenecks.  I can't quote you data for the MMX200:MMX300 (no such CPU) so I
choose the closest P5 equivalent, MMX166:MMX233.

For all these, the m/b bus operated at the same 66MHz, just like with the
original.

 SPECint (base)95 - Unix	 SPECint 95 - NT40

MMX166             5.60          5.54

MMX233             7.12          7.02

P2/233             9.47          9.44

P2/333             12.8          12.7

Other non-f.p. benchmarks make my "case" even better.

We first consider:

MMX233:MMX166       1.27x       1.27x

But the MHz ratio = 233/166 = 1.40x   ;-)

And of course the benchmarks do not stress L2 etc as a chess program might.  If
they did, hmmmm. The L2 cache speed 66MHz is the same for both MMX166 and
MMX233, but while size is 512K for the 233 it _might_ be just 256K for the
MMX166.  Now if 512K, the stress would obviously be higher on the faster core -
so 1.27 --> say 1.22 (just to put a number on it to avoid confusion).  If 256K,
(which seems unlikely else like was not being compared with like), it is
possible that the 1.27x applies, or maybe very marginally higher (size less
important than speed because of the chess-use).

Comparing MHz ration for P6 core (here, "proportionate" L2 speed, but no size
change)

P2/333:P2/233      1.35x         1.35x

But the MHz ratio = 333/233 = 1.43x   ;-)

So in summary I suggest the 1.5x you quote is more like 1.35x in chess practice.

The 1.66x is also interesting.  The above table is useful here too.

P2/233 : MMX233     1.33x        1.34x

Here both had 512K L2, but the P2/233's worked at 116.5 MHz and the other at
66MHz.

Of course we remember that where all other things are equal (?!), L2 dependency
is *reduced* with the P6 (=P2).  So we have two factors suggesting that the
1.33x overstates it (same 512K size, reduced dependency) and one (L2 speed) that
suggests the opposite.  My guess is the c1.34x holds.  This is supported by Ed's
Rebel benchmarks (different program, I know, but closer to Junior than is
SpecInt!). In any event, it appears inconceivable that the 1.34x could become
1.66x in a chess environment - chess programs simply don't do that sort of
thing.

>So it's not out of the question.

And I say 2.5x is, based on the above.  About 1.35^2, say 1.9x tops and that's
pushing it.  Bottlenecks are nasty things: bypass one, you just fall into the
next.

So - over to you!

>-Tom

Kind regards

fca



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.