Author: Rafael Andrist
Date: 02:38:49 05/12/01
Go up one level in this thread
On May 12, 2001 at 00:47:36, Dann Corbit wrote:
>For machines with MMX, this is definitely faster than any other method no matter
>how many bits are populated (1 to 32) on a 64 bit bitboard.
>
>#include "amd3dx.h"
>
>/* Athlon MMX optimized popcount function, based on AMD's Athlon
> optimization manuals */
>
>unsigned PopulationCount (unsigned __int64 v)
>{
> static const __int64 C55 = 0x5555555555555555;
> static const __int64 C33 = 0x3333333333333333;
> static const __int64 C0F = 0x0F0F0F0F0F0F0F0F;
>
> __asm {
> movd mm0, word ptr v;
> punpckldq mm0, word ptr v + 4;
> movq mm1,mm0;
> psrld mm0,1;
> pand mm0,[C55];
> psubd mm1,mm0;
> movq mm0,mm1;
> psrld mm1,2;
> pand mm0,[C33];
> pand mm1,[C33];
> paddd mm0,mm1;
> movq mm1,mm0;
> psrld mm0,4;
> paddd mm0,mm1;
> pand mm0,[C0F];
> pxor mm1,mm1;
> psadbw (mm0,mm1);
> movd eax,mm0;
> emms;
> }
>}
The psadbw instruction is not a standard MMX command, a PIII or a AMD K6 is
necessary. To make this algorithm MMX-compatible, you must replace this psadbw
instruction by several others, and this slows the code down.
Rafael B. Andrist
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.