Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Why is P4 less efficient than Athlon (or P3) for chess programs ?

Author: Tom Kerrigan

Date: 17:11:24 07/03/03

Go up one level in this thread


On July 03, 2003 at 19:38:29, Robert Hyatt wrote:

>On July 03, 2003 at 16:48:07, Tom Kerrigan wrote:
>
>>On July 03, 2003 at 16:23:10, Russell Reagan wrote:
>>
>>>On July 03, 2003 at 15:02:55, Joachim Rang wrote:
>>>
>>>>The main reason is, that Athlon and P3 have 9 instructions per cycle and P4 has
>>>>only 6.
>>>
>>>Also the length of the pipeline on the P3 is 10, which means that a mispredicted
>>>branch costs 10 cycles. On the P4 the length of the pipeline is 20, which means
>>>it costs 20 cycles for a mispredicted branch. I may be wrong about the actual
>>>numbers (10 and 20, but I think they are close). I'm not sure what the length is
>>>on the Athlon. Anyone know?
>>
>>Pentium 3: 12 cycles
>>Pentium 4: 20 cycles
>>Athlon: 10 cycles
>>Opteron/Athlon 64: 12 cycles
>>
>>In addition to unpredictable branches and parallelism, the P4 also has 8k of L1
>>cache vs. the Athlon's 64k. The P4's cache is faster, but that may not make up
>>for the difference in size with typical chess programs.
>>
>>-Tom
>
>That's slightly inaccurate.  The PIV has 8k of L1 data cache, but it also
>has a 12K micro-op trace cache that is also L1.  The 12K is the number of
>micro-ops Intel claims it stores, but I have never found them saying how
>many "bytes" that turns into.  But I would imagine it is at least 24K if not
>more as 1-byte instructions are pretty rare I'd think, in a RISCY-type
>internal architecture.

Well, IIRC, uops are around 240 bits, so that's around ~350kbytes of memory,
which seems like it's in the right ballpark given die photos. But the number of
instructions it stores is more interesting than the amount of memory the
implementation requires. The Athlon has a 64k L1 instruction cache which can
store ~21k instructions if you figure x86 instructions average 3 bytes (just a
guess given code sizes of x86 vs RISC programs).

So it seems like the Athlon has a much bigger L1 instruction cache but really I
don't think the size of an instruction cache matters as long as it's big enough
to give you the typical > 99.9% hit rate.

BTW, 1 byte instructions will never occur on any sort of RISCy architecture.
RISC = fixed length instructions, remember?

-Tom



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.