Author: Gerd Isenberg
Date: 06:24:35 12/07/03
Go up one level in this thread
On December 06, 2003 at 18:08:01, Sven Reichard wrote: >Hi guys, > >once more, I got surprised by the computer (although in a positive way). Maybe >somebody can point out a flaw in my thinking? > >In order to practice assembler programming, I wrote a lengthy bitboard >procedure. (For those interested, it's an Othello move generator, taking own and >other pieces as inputs, and producing legal moves as output.) > >It was all MMX code, and I use an Athlon. From literature I got the following >information: >- Most MMX instructions have a latency of 2 cycles; none has less. >- There are 2 MMX pipelines. >From this I deduce that with optimal scheduling avoiding hazards and cache >misses, this code should execute at 1 instruction/cycle. >In fact, it issues about 1.5 instructions per cycle, completing the 200 odd >instructions in 133 cycles. > >Does anybody have an idea what's going on here? Maybe the documentation (taken >from AMD's web site) is outdated? Have they added a third pipeline? Or did I >misunderstand something? > >Thanks for your hints, >Sven. Hi Sven, i made similar experience with Athlon XP and MMX fill algorithms, with four independent instruction chains, with up to two MMX-instructions per cycle. Thus a factor of four speedup for pipelined parallel over sequential. I'm not quite sure what instruction latency exactly is. I guess it is time to decode plus time to execute the instruction. I further guess, that the pure MMX-ALU execute latency is only one cycle or less (there is also a third store/load unit) and that decode and execute of different instructions is done simultaneously with direct path instructions, in opposite to vector path instructions, which exclusively blocks all decode and execution-units. Gerd
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.