Author: Sven Reichard
Date: 15:08:01 12/06/03
Hi guys, once more, I got surprised by the computer (although in a positive way). Maybe somebody can point out a flaw in my thinking? In order to practice assembler programming, I wrote a lengthy bitboard procedure. (For those interested, it's an Othello move generator, taking own and other pieces as inputs, and producing legal moves as output.) It was all MMX code, and I use an Athlon. From literature I got the following information: - Most MMX instructions have a latency of 2 cycles; none has less. - There are 2 MMX pipelines. From this I deduce that with optimal scheduling avoiding hazards and cache misses, this code should execute at 1 instruction/cycle. In fact, it issues about 1.5 instructions per cycle, completing the 200 odd instructions in 133 cycles. Does anybody have an idea what's going on here? Maybe the documentation (taken from AMD's web site) is outdated? Have they added a third pipeline? Or did I misunderstand something? Thanks for your hints, Sven.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.