Author: Matthias Gemuh
Date: 08:56:41 05/01/04
Go up one level in this thread
On May 01, 2004 at 11:01:41, Gerd Isenberg wrote: >On May 01, 2004 at 05:48:08, Matthias Gemuh wrote: > >> >> >>Hi Windows-Programmers, >>Is there Assembler code for LSB(), MSB(), or PopCount() for Athlon >>that is faster than C counterpart ? >>Thanks, >>Matthias. > >Hi Matthias, > >yes for bitscan, using inline assembly and pair of bitscan instructions is >propably the fastest on x86-32. But MSC6 inline assembly has the drawback with >__forceinlined functions, to pass parameters, even if there are already inside >registers via stack again and to do two unneccesary store/loads. GCC inline >assembly is a bit smarter here. Therefore for LSB you may try Walter Faxon's >magic Bitscan or Matt Taylor's folded de Bruijn multiplication in pure C for >X86-32 too, or for MSB some loopup routine, e.g. the one from Eugene Nalimov. > >For popcount it is probably not worth to use assembly at all. >For low populated words the loop version with x &= x-1 is quite ok. >The swar-popcount in C is fine too. I actually use still the MMX/3DNow version >from AMD's Athlon32 optimization manual, but i count as least two bitboards >simultaniously. > >For AMD64 you may use bsf/bsr intrinsics with the new ms compiler, inline >assembly with gnu C or a protable C e.g. with 64-bit deBruijn multiplication >with lookup. There are rumors about an undocumeted 64-bit popcount instruction >for AMD64. > >Gerd Hi Gerd, thanks for the detailed answer !! My compiler is Borland C++ Builder 5. My tests show that my compiler prefers C code for all 3 functions on Intel and AMD chips, except MSB() and LSB() on Intel, where shifting things around does help. Best, Matthias.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.