Author: Anthony Cozzie
Date: 16:45:24 12/06/03
Go up one level in this thread
On December 06, 2003 at 18:13:52, Sven Reichard wrote:
>On December 06, 2003 at 18:11:02, Anthony Cozzie wrote:
>
>>On December 06, 2003 at 18:08:01, Sven Reichard wrote:
>>
>>>Hi guys,
>>>
>>>once more, I got surprised by the computer (although in a positive way). Maybe
>>>somebody can point out a flaw in my thinking?
>>>
>>>In order to practice assembler programming, I wrote a lengthy bitboard
>>>procedure. (For those interested, it's an Othello move generator, taking own and
>>>other pieces as inputs, and producing legal moves as output.)
>>>
>>>It was all MMX code, and I use an Athlon. From literature I got the following
>>>information:
>>>- Most MMX instructions have a latency of 2 cycles; none has less.
>>>- There are 2 MMX pipelines.
>>>From this I deduce that with optimal scheduling avoiding hazards and cache
>>>misses, this code should execute at 1 instruction/cycle.
>>>In fact, it issues about 1.5 instructions per cycle, completing the 200 odd
>>>instructions in 133 cycles.
>>>
>>>Does anybody have an idea what's going on here? Maybe the documentation (taken
>>>from AMD's web site) is outdated? Have they added a third pipeline? Or did I
>>>misunderstand something?
>>>
>>>Thanks for your hints,
>>>Sven.
>>
>>The athlon's MMX ALUs are fully pipelined, meaning that they can retire 1
>>instruction/clock each.
>>
>>anthony
>
>So the 2 cycle latency stated in the Optimization Guide is incorrect?
>
>Sven.
cycle 1:
I1
cycle 2:
I2 I1
cycle 3:
I3 I2 (I1 retires)
cycle 4:
I4 I3 (I2 retires)
assuming no data dependencies of course.
anthony
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.