Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Assembler for LSB() and MSB() on AMD processor ?

Author: Matthias Gemuh

Date: 08:56:41 05/01/04

Go up one level in this thread


On May 01, 2004 at 11:01:41, Gerd Isenberg wrote:

>On May 01, 2004 at 05:48:08, Matthias Gemuh wrote:
>
>>
>>
>>Hi Windows-Programmers,
>>Is there Assembler code for LSB(), MSB(), or PopCount() for Athlon
>>that is faster than C counterpart ?
>>Thanks,
>>Matthias.
>
>Hi Matthias,
>
>yes for bitscan, using inline assembly and pair of bitscan instructions is
>propably the fastest on x86-32. But MSC6 inline assembly has the drawback with
>__forceinlined functions, to pass parameters, even if there are already inside
>registers via stack again and to do two unneccesary store/loads. GCC inline
>assembly is a bit smarter here. Therefore for LSB you may try Walter Faxon's
>magic Bitscan or Matt Taylor's folded de Bruijn multiplication in pure C for
>X86-32 too, or for MSB some loopup routine, e.g. the one from Eugene Nalimov.
>
>For popcount it is probably not worth to use assembly at all.
>For low populated words the loop version with x &= x-1 is quite ok.
>The swar-popcount in C is fine too. I actually use still the MMX/3DNow version
>from AMD's Athlon32 optimization manual, but i count as least two bitboards
>simultaniously.
>
>For AMD64 you may use bsf/bsr intrinsics with the new ms compiler, inline
>assembly with gnu C or a protable C e.g. with 64-bit deBruijn multiplication
>with lookup. There are rumors about an undocumeted 64-bit popcount instruction
>for AMD64.
>
>Gerd



Hi Gerd,
thanks for the detailed answer !!
My compiler is Borland C++ Builder 5.
My tests show that my compiler prefers C code for all 3 functions on
Intel and AMD chips, except MSB() and LSB() on Intel, where shifting things
around does help.
Best,
Matthias.







This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.