Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: FirstBit() in assembler

Author: Bo Persson

Date: 10:58:56 01/15/02

On January 15, 2002 at 12:53:02, Dan Newman wrote:

>On January 15, 2002 at 04:36:17, Severi Salminen wrote:
>
>>>>I think something like this should work:
>>>>
>>>>        bsr eax dword ptr bitboard
>>>>        sub eax 32
>>>>        bsr eax dword ptr bitboard+4
>>>>        add eax 32
>>>>
>>>>which makes using last-bit a bit more expensive, but it's still
>>>>straight line code...
>>>>
>>>>-Dan.
>>>
>>>Oh yeah, of course. I can't see why I didn't see that myself...
>>
>>The same here too, the more you try to think things, the dumber you'll get :(
>>The above is of course the fastest version. But how expensive is bsr/bsf?? Is it
>>allways the same or does it take more cycles if the first 1-bit is farther away?
>>This will make a big impact and a version with jmp might be faster.
>>
>>Severi
>
>Hey, I spent many a long minute figuring that out :).
>
>As I understand it, the P3 does BSF/BSR in constant time taking either 2
>or 3 micro-ops depending on addressing mode (from Intel's P3 optimization
>manual).  (I'm not sure how this translates into clock cylcles, but I'd
>guess that means 2 or 3 or them.)  BSF/BSR used to take (on earlier x86's
>before the P2) a very large number of clocks and may well have depended
>on bit position.

The historic track is even worse! When the instructions were first introduced in
the 386, they were of course very fast for that processor. On the Pentium
generation a simple table lookup was suddenly *much* faster than these
specialized instructions.

Then on the Pentium II/III generation BSR/BSF were suddenly down to 1 or 2
clocks, but Intel didn't care to tell anyone. I learned it here!

>-Dan.


Bo Persson
bop2@telia.com

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.