Computer Chess Club Archives


Search

Terms

Messages

Subject: Bitscan Conclusions

Author: Matt Taylor

Date: 13:19:21 01/05/03


There is truly nothing that could waste more time than trying to write -the-
most optimal routine possible, but this has been much fun for me. I am using a
tool now to read exact cycle counts for the AMD Athlon (K7) processor.

I have some insight now into the bsf instruction and its implementation on
different processors. The P5 (Pentium) and K6 processors both use microcode
loops, I believe. Not completely sure, but the timings would imply it as they
depend on the value of the operand. The P6 (PPro/Pentium 2/Pentium 3) implements
bsf/bsr with a priority encoder, which is why they are fast. The K7 (Athlon)
uses a binary search algorithm in microcode. I would presume the P7 is similar.

Someone had posed the question, "How will Hammer's bsf instruction fare?" The
answer is that it will be at most 1 cycle slower than the K7's bsf assuming it
is implemented the same way.

I am finishing up optimizing some of the proposed routines for Athlon. The bsf
routine takes something like 13-21 cycles depending on which word the bit lies
in. My favorite routine (binary search) optimized currently runs 15 cycles. I
think I can get yet another cycle or two back from it. Chances are that it will
run about as fast as the bsf instruction on Hammer. The difference is that it
costs almost 128 bytes and 4 registers instead of 3 bytes and 1 register.

Anyone interested in the final results can e-mail me for the optimized routines.
The link I posted a while ago is no longer valid as I will be switching ISPs
early next week.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.