Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Why is P4 less efficient than Athlon (or P3) for chess programs ?

Author: Tom Kerrigan
Date: 14:19:50 07/05/03
On July 04, 2003 at 23:26:56, Robert Hyatt wrote:

>On July 03, 2003 at 20:11:24, Tom Kerrigan wrote:
>
>>On July 03, 2003 at 19:38:29, Robert Hyatt wrote:
>>
>>>On July 03, 2003 at 16:48:07, Tom Kerrigan wrote:
>>>
>>>>On July 03, 2003 at 16:23:10, Russell Reagan wrote:
>>>>
>>>>>On July 03, 2003 at 15:02:55, Joachim Rang wrote:
>>>>>
>>>>>>The main reason is, that Athlon and P3 have 9 instructions per cycle and P4 has
>>>>>>only 6.
>>>>>
>>>>>Also the length of the pipeline on the P3 is 10, which means that a mispredicted
>>>>>branch costs 10 cycles. On the P4 the length of the pipeline is 20, which means
>>>>>it costs 20 cycles for a mispredicted branch. I may be wrong about the actual
>>>>>numbers (10 and 20, but I think they are close). I'm not sure what the length is
>>>>>on the Athlon. Anyone know?
>>>>
>>>>Pentium 3: 12 cycles
>>>>Pentium 4: 20 cycles
>>>>Athlon: 10 cycles
>>>>Opteron/Athlon 64: 12 cycles
>>>>
>>>>In addition to unpredictable branches and parallelism, the P4 also has 8k of L1
>>>>cache vs. the Athlon's 64k. The P4's cache is faster, but that may not make up
>>>>for the difference in size with typical chess programs.
>>>>
>>>>-Tom
>>>
>>>That's slightly inaccurate.  The PIV has 8k of L1 data cache, but it also
>>>has a 12K micro-op trace cache that is also L1.  The 12K is the number of
>>>micro-ops Intel claims it stores, but I have never found them saying how
>>>many "bytes" that turns into.  But I would imagine it is at least 24K if not
>>>more as 1-byte instructions are pretty rare I'd think, in a RISCY-type
>>>internal architecture.
>>
>>Well, IIRC, uops are around 240 bits, so that's around ~350kbytes of memory,
>>which seems like it's in the right ballpark given die photos. But the number of
>>instructions it stores is more interesting than the amount of memory the
>>implementation requires. The Athlon has a 64k L1 instruction cache which can
>>store ~21k instructions if you figure x86 instructions average 3 bytes (just a
>>guess given code sizes of x86 vs RISC programs).
>>
>>So it seems like the Athlon has a much bigger L1 instruction cache but really I
>>don't think the size of an instruction cache matters as long as it's big enough
>>to give you the typical > 99.9% hit rate.
>
>I'm not sure about the "much bigger L1 instruction cache."
>
>I don't see why a micro-op would be 240 bits, but that's your number, so I'll
>take it at face value.  If the PIV has 350K of L1 instruction, how is that
>"less" than 64K?  Most instructions (X86) don't take but 1-2 uops according
>to Intel.

uops are huge compared to machine language instructions because they have to
encode all of the information necessary for the core to execute the instruction.

Asking how 350k of cache can be less than 64k is like asking how one ton of
uranium can produce more power than ten tons of coal. The Athlon stores x86
instructions in its icache, which are far denser than the uops the P4 stores.

>>BTW, 1 byte instructions will never occur on any sort of RISCy architecture.
>>RISC = fixed length instructions, remember?
>Not always.
>RISC == Really Invented by Seymour Cray.  :)
>And the CDC/Cray machines were not all fixed length, although most were.

Fine, 1 byte instructions will never occur on any sort of Reduced Instruction
Set Computing (a la Hennessy, Patterson) architecture.

-Tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.