Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The fastest popcount

Author: Rafael Andrist

Date: 02:38:49 05/12/01

Go up one level in this thread


On May 12, 2001 at 00:47:36, Dann Corbit wrote:

>For machines with MMX, this is definitely faster than any other method no matter
>how many bits are populated (1 to 32) on a 64 bit bitboard.
>
>#include "amd3dx.h"
>
>/* Athlon MMX optimized popcount function, based on AMD's Athlon
>   optimization manuals */
>
>unsigned PopulationCount (unsigned __int64 v)
>{
> static const __int64 C55 = 0x5555555555555555;
> static const __int64 C33 = 0x3333333333333333;
> static const __int64 C0F = 0x0F0F0F0F0F0F0F0F;
>
> __asm {
>  movd mm0, word ptr v;
>  punpckldq mm0, word ptr v +  4;
>  movq mm1,mm0;
>  psrld mm0,1;
>  pand mm0,[C55];
>  psubd mm1,mm0;
>  movq mm0,mm1;
>  psrld mm1,2;
>  pand mm0,[C33];
>  pand mm1,[C33];
>  paddd mm0,mm1;
>  movq mm1,mm0;
>  psrld mm0,4;
>  paddd mm0,mm1;
>  pand mm0,[C0F];
>  pxor mm1,mm1;
>  psadbw (mm0,mm1);
>  movd eax,mm0;
>  emms;
> }
>}

The psadbw instruction is not a standard MMX command, a PIII or a AMD K6 is
necessary. To make this algorithm MMX-compatible, you must replace this psadbw
instruction by several others, and this slows the code down.

Rafael B. Andrist



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.