Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Why AMD just don`t blow Intel...

Author: Gerd Isenberg
Date: 11:41:48 02/10/04
On February 09, 2004 at 16:53:40, Vincent Diepeveen wrote:

>On February 09, 2004 at 15:14:53, Anthony Cozzie wrote:
>
>>On February 09, 2004 at 14:23:55, Aloisio Ponti Lopes wrote:
>>
>>>... by releasing their processors at the same speed in GHz ?
>>>
>>>As a consumer I can`t understand that. It seems to me that AMD`s problem is the
>>>heat issue... so the important thing to do when buying an AMD processor is to
>>>liquid-cool it or get some sort of special (refrigerated) case to build the
>>>system, if you want to push it to the limit by overclocking ?
>>>
>>>I was also looking for a good notebook as I have 2 medical offices now, and
>>>buying another desktop was not my idea... but I began to search for an AMD
>>>notebook, and... guess what?! It`s really difficult to find one here (Brazil)...
>>>there are Toshibas everywhere, from Celeron to Pentium 4, and of course there
>>>are the new Centrinos with Wi-FI (from Acer too), but they`re extremelly
>>>expensive here... so the AMD processors rocks for chess, but their marketing
>>>sucks; I could only find a HP XP 2400+ (2.0 GHz) with DVD/CD-RW and 512 MB RAM.
>>>Only one model. No other models or options to compare...
>>>
>>>Maybe it`s time for AMD to look for an smarter CEO or at least someone to put
>>>some fire on the market, like Steve Jobs (Apple) or Lee Iacoca (Chrysler) did
>>>some years ago...
>>>
>>>A. Ponti
>>
>>It is very interesting to see what the average non-architecture guy thinks :)
>>
>>Modern CPUs break the execution of an instruction into many parts.  More parts =
>>deeper pipeline -> faster clocks -> more GHz.  However, the deeper the pipeline,
>>the more branch mispredictions hurt you, and your memory isn't getting any
>>faster, etc.
>>
>>I think I am going to try to write an article "understanding modern superscalar
>>pipelines for the uninitiated" because there are so many people here that don't
>>understand why P4 isn't faster than Opteron even though its clocked faster.
>>
>>anthony
>
>Actually the L2 cache from opteron is clocked higher than that of any P4. So it
>isn't true even.
>

Sorry Vincent,

i don't get it. L2 cache clock, can you explain a bit?

Between L1-Data Cache and ALU/AGU there is a thing called Load/Store Unit,  with
LS1 (Pre-Cache) and LS2 (Post-Cache Load/Store unit). You don't confuse L2 with
LS2 (which is pure data cache related)?

http://www.chip-architect.com/
Understanding the detailed Architecture of AMD's 64 bit Core
3.6 The Load/Store Unit, LS1 and LS2

>Based upon decoding speed from code to trace cache the P4 at 3.2Ghz is 3.2Ghz
>clocked and the opteron at 2.2Ghz is 6.6Ghz clocked :)

But doesn't the P4's trace cache already contain decoded micro-ops?
Do you mean due to the ability to decode three instructions in macro-ops per
cycle?


What i found amazing with Opteron is following:

Software Optimization
Guide for AMD Athlon™ 64 and
AMD Opteron™ Processors

Chapter 9 Optimizing with SIMD Instructions 227/228:

Instruction 0     2     4     6     8     10    12    14
MOVQ        xxxxxx
PSWAPD      xxxxxx
PFMUL             xxxxxxxxxxxxxxxxxx
PFMUL                xxxxxxxxxxxxxxxxxx
PFPNACC                                xxxxxxxxxxxxxxxxxxx

takes 15 cycles, next one 17 cycles

Instruction 0     2     4     6     8     10    12    14    16    18
MOVQ        xxxxxx
MOVQ        xxxxxx
MOVQ           xxxxxx
MOVQ           xxxxxx
PSWAPD            xxxxxx
PSWAPD            xxxxxx
PSWAPD               xxxxxx
PSWAPD               xxxxxx
PFMUL             xxxxxxxxxxxxxxxxxx
PFMUL                xxxxxxxxxxxxxxxxxx
PFMUL                   xxxxxxxxxxxxxxxxxx
PFMUL                      xxxxxxxxxxxxxxxxxx
PFMUL                         xxxxxxxxxxxxxxxxxx
PFMUL                            xxxxxxxxxxxxxxxxxx
PFMUL                               xxxxxxxxxxxxxxxxxx
PFMUL                                  xxxxxxxxxxxxxxxxxx
PFPNACC                             xxxxxxxxxxxxxxxxxxx
PFPNACC                                xxxxxxxxxxxxxxxxxxx
PFPNACC                                   xxxxxxxxxxxxxxxxxxx
PFPNACC                                      xxxxxxxxxxxxxxxxxxx


"Multiplying four complex single-precision numbers only takes 17 cycles as
opposed to 14 cycles to multiply one complex single-precision number. The
floating-point pipes are kept busy by feeding new instructions into the
floating-point pipeline each cycle. In the arrangement above, 24 floating-point
operations are performed in 17 cycles, achieving more than a 3.5x increase in
performance."

bis gleich,
Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.