Author: Walter Faxon
Date: 22:42:31 12/22/02
Go up one level in this thread
On December 22, 2002 at 20:10:45, Matt Taylor wrote: >snip< >I've added several tests using your 32-bit routine. It actually works a little >better, I think. (It also helps that the table is 20 bytes smaller -- but that's >mostly irrelevant.) I also added the "naive" method which looks roughly like >this: > >int shift_scan(bitboard bb) >{ > index = 0; > while(bb != 0 && (bb & 1) == 0) > { > index++; > bb >>= 1; > } > > return index; >} > >One other interesting possibility opens up due to using a 32-bit scan routine >like that: parallelism! The MMX routine performs poorly when applied to the >64-bit algorithm because it is difficult to fold at the end. I'm working on a >32-bit version. Initially it was -almost- the fastest bitscan yet, but then I >realized that I had a bug. I can do the bitscans in parallel, but I can't >actually clear both bits -- only the one I return. The current code ends up >throwing a bit away, cutting the time in half. After I fix the bug, we'll see >how it changes things. > >You know, it would also be interesting also to have a routine return two bits >and do some unrolling. It would make MMX invaluable as you could theoretically >grab 8 bits at a time. > >-Matt Hey, Matt. Well, as long as you're adding other test code, below is one more 32-bit routine, adapted from a post by Gerd last June (in fact, made super-pedantic). No table required. -- Walter Code follows: // --------------------------------------------------------------------------- // LSB_32x() -- find, remove, report least-significant bit of 32. // Argument must be non-null. Method: convert conditional tests to 0/1. // Based on 64-bit code posted to CCC by Gerd Isenberg, June 18, 2002. // Written by Walter Faxon, December 2002. No copyright. No warranty. // inline // may differ by compiler int LSB_32x( u32* value_p ) { u32 copy, bit, index; copy = *value_p; // get 32-bit value bit = -copy; // get LSB as single bit.. // // NOTE: at this point, normal assembly code would leave the // zero flag set when the input (*value_p == 0). If a "break" or // "goto" on that flag is possible, one could for example, loop // without any leading test for (*value_p != 0). // bit &= copy; // ..done copy ^= bit; // remove from value *value_p = copy; // and store back // convert conditional tests to 0/1, building binary number index = ((bit & 0xFFFF0000) != 0); index <<= 1; index |= ((bit & 0xFF00FF00) != 0); index <<= 1; index |= ((bit & 0xF0F0F0F0) != 0); index <<= 1; index |= ((bit & 0xCCCCCCCC) != 0); index <<= 1; index |= ((bit & 0xAAAAAAAA) != 0); return index; } //eof
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.