Author: Bo Persson
Date: 10:58:56 01/15/02
Go up one level in this thread
On January 15, 2002 at 12:53:02, Dan Newman wrote: >On January 15, 2002 at 04:36:17, Severi Salminen wrote: > >>>>I think something like this should work: >>>> >>>> bsr eax dword ptr bitboard >>>> sub eax 32 >>>> bsr eax dword ptr bitboard+4 >>>> add eax 32 >>>> >>>>which makes using last-bit a bit more expensive, but it's still >>>>straight line code... >>>> >>>>-Dan. >>> >>>Oh yeah, of course. I can't see why I didn't see that myself... >> >>The same here too, the more you try to think things, the dumber you'll get :( >>The above is of course the fastest version. But how expensive is bsr/bsf?? Is it >>allways the same or does it take more cycles if the first 1-bit is farther away? >>This will make a big impact and a version with jmp might be faster. >> >>Severi > >Hey, I spent many a long minute figuring that out :). > >As I understand it, the P3 does BSF/BSR in constant time taking either 2 >or 3 micro-ops depending on addressing mode (from Intel's P3 optimization >manual). (I'm not sure how this translates into clock cylcles, but I'd >guess that means 2 or 3 or them.) BSF/BSR used to take (on earlier x86's >before the P2) a very large number of clocks and may well have depended >on bit position. The historic track is even worse! When the instructions were first introduced in the 386, they were of course very fast for that processor. On the Pentium generation a simple table lookup was suddenly *much* faster than these specialized instructions. Then on the Pentium II/III generation BSR/BSF were suddenly down to 1 or 2 clocks, but Intel didn't care to tell anyone. I learned it here! >-Dan. Bo Persson bop2@telia.com
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.