Author: Dan Newman
Date: 09:53:02 01/15/02
Go up one level in this thread
On January 15, 2002 at 04:36:17, Severi Salminen wrote: >>>I think something like this should work: >>> >>> bsr eax dword ptr bitboard >>> sub eax 32 >>> bsr eax dword ptr bitboard+4 >>> add eax 32 >>> >>>which makes using last-bit a bit more expensive, but it's still >>>straight line code... >>> >>>-Dan. >> >>Oh yeah, of course. I can't see why I didn't see that myself... > >The same here too, the more you try to think things, the dumber you'll get :( >The above is of course the fastest version. But how expensive is bsr/bsf?? Is it >allways the same or does it take more cycles if the first 1-bit is farther away? >This will make a big impact and a version with jmp might be faster. > >Severi Hey, I spent many a long minute figuring that out :). As I understand it, the P3 does BSF/BSR in constant time taking either 2 or 3 micro-ops depending on addressing mode (from Intel's P3 optimization manual). (I'm not sure how this translates into clock cylcles, but I'd guess that means 2 or 3 or them.) BSF/BSR used to take (on earlier x86's before the P2) a very large number of clocks and may well have depended on bit position. -Dan.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.