Author: Matt Taylor
Date: 11:01:56 01/19/03
Go up one level in this thread
Interesting to note that several of those routines rely on technically undefined behavior. Under the bsf instruction, the manual states that, "...If the contents of the source operand are 0, the contents of the destination operand are undefined." Conveniently it seems that this works on all existing implementations. A similar trick can be used with shifts. Integer shift instructions mask their shift count to avoid unnecessary work. As a result, shifting by 32 does not change the destination operand. I probably won't optimize your code for Pentium 4. I was rather annoyed when some code I wrote executed about as fast on my Pentium 90 as it would on a high-end Pentium 4. All the old tricks are now expensive. Shifting is 4 clocks latency. The full adder (adc/sbb) is 2-3 clocks -throughput-. Latency is 6-8 clocks. The setcc instruction is 5 clocks latency. Every one of these instructions has a latency of 1 on Athlon and the original Pentium. They all execute with a throughput of up to 3 instructions per cycle (1/3) on Athlon and 2 instructions per cycle (1/2) on original Pentium. Sigh. I'll optimize it for Athlon since I am now most familiar with its rules, and I have tools to analyze the code. Taking a look now... -Matt
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.