Author: Gerd Isenberg
Date: 08:01:41 05/01/04
Go up one level in this thread
On May 01, 2004 at 05:48:08, Matthias Gemuh wrote: > > >Hi Windows-Programmers, >Is there Assembler code for LSB(), MSB(), or PopCount() for Athlon >that is faster than C counterpart ? >Thanks, >Matthias. Hi Matthias, yes for bitscan, using inline assembly and pair of bitscan instructions is propably the fastest on x86-32. But MSC6 inline assembly has the drawback with __forceinlined functions, to pass parameters, even if there are already inside registers via stack again and to do two unneccesary store/loads. GCC inline assembly is a bit smarter here. Therefore for LSB you may try Walter Faxon's magic Bitscan or Matt Taylor's folded de Bruijn multiplication in pure C for X86-32 too, or for MSB some loopup routine, e.g. the one from Eugene Nalimov. For popcount it is probably not worth to use assembly at all. For low populated words the loop version with x &= x-1 is quite ok. The swar-popcount in C is fine too. I actually use still the MMX/3DNow version from AMD's Athlon32 optimization manual, but i count as least two bitboards simultaniously. For AMD64 you may use bsf/bsr intrinsics with the new ms compiler, inline assembly with gnu C or a protable C e.g. with 64-bit deBruijn multiplication with lookup. There are rumors about an undocumeted 64-bit popcount instruction for AMD64. Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.