Computer Chess Club Archives


Search

Terms

Messages

Subject: A Few Comments

Author: Matt Taylor

Date: 11:01:56 01/19/03

Go up one level in this thread


Interesting to note that several of those routines rely on technically undefined
behavior. Under the bsf instruction, the manual states that, "...If the contents
of the source operand are 0, the contents of the destination operand are
undefined." Conveniently it seems that this works on all existing
implementations.

A similar trick can be used with shifts. Integer shift instructions mask their
shift count to avoid unnecessary work. As a result, shifting by 32 does not
change the destination operand.

I probably won't optimize your code for Pentium 4. I was rather annoyed when
some code I wrote executed about as fast on my Pentium 90 as it would on a
high-end Pentium 4. All the old tricks are now expensive. Shifting is 4 clocks
latency. The full adder (adc/sbb) is 2-3 clocks -throughput-. Latency is 6-8
clocks. The setcc instruction is 5 clocks latency. Every one of these
instructions has a latency of 1 on Athlon and the original Pentium. They all
execute with a throughput of up to 3 instructions per cycle (1/3) on Athlon and
2 instructions per cycle (1/2) on original Pentium. Sigh.

I'll optimize it for Athlon since I am now most familiar with its rules, and I
have tools to analyze the code. Taking a look now...

-Matt



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.