Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Assembler for LSB() and MSB() on AMD processor ?

Author: Gerd Isenberg

Date: 08:01:41 05/01/04

Go up one level in this thread


On May 01, 2004 at 05:48:08, Matthias Gemuh wrote:

>
>
>Hi Windows-Programmers,
>Is there Assembler code for LSB(), MSB(), or PopCount() for Athlon
>that is faster than C counterpart ?
>Thanks,
>Matthias.

Hi Matthias,

yes for bitscan, using inline assembly and pair of bitscan instructions is
propably the fastest on x86-32. But MSC6 inline assembly has the drawback with
__forceinlined functions, to pass parameters, even if there are already inside
registers via stack again and to do two unneccesary store/loads. GCC inline
assembly is a bit smarter here. Therefore for LSB you may try Walter Faxon's
magic Bitscan or Matt Taylor's folded de Bruijn multiplication in pure C for
X86-32 too, or for MSB some loopup routine, e.g. the one from Eugene Nalimov.

For popcount it is probably not worth to use assembly at all.
For low populated words the loop version with x &= x-1 is quite ok.
The swar-popcount in C is fine too. I actually use still the MMX/3DNow version
from AMD's Athlon32 optimization manual, but i count as least two bitboards
simultaniously.

For AMD64 you may use bsf/bsr intrinsics with the new ms compiler, inline
assembly with gnu C or a protable C e.g. with 64-bit deBruijn multiplication
with lookup. There are rumors about an undocumeted 64-bit popcount instruction
for AMD64.

Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.