Author: Gerd Isenberg
Date: 06:44:30 04/21/03
Go up one level in this thread
On April 19, 2003 at 12:34:22, Dieter Buerssner wrote:
>On April 19, 2003 at 05:29:13, Gerd Isenberg wrote:
>
>>Yes Dieter, i tried a lot.
>
>Gerd, I have no doubt about this. I only questioned the 2 specific routines (the
>loops on the 2 32 bit parts, once in assembler, once in C). I questioned it
>only, because I looked at the generated assembler of the C-Code, which was
>practically identical, to the assembly you showed.
>
>Anyway, I made a small test program. dieter() is the C-Code I had shown, gerd()
>your assembly (I wrote the same in GCC-style, too), and crafty() the routine Uri
>showed in this thread. In my test, gerd() was not faster than dieter() (dieter()
>was actually slightly faster than gerd() using MSVC, and the same using gcc). As
>should be expected, with this specific test, crafty() was a bit slower (much
>more with gcc than with MSVC). I used -O2 for both compilers. No doubt, one can
>argue about my testing procedure. I actually made it out of fun only, not to do
>some scientific experiments. I also used a union for bitboard, and changed for
>this the functions in the thread slightly. I just did not want to try which
>tricks would work best to convince the compiler to extract the 2 32-bit parts
>efficiently.
>
>Regards,
>Dieter
>
>MYINLINE int dieter(BU a)
>{
> unsigned long w;
> int n = 0;
> w = a.w[0];
> if (w)
> do
> {
> n++;
> }
> while ((w &= w-1) != 0);
> w = a.w[1];
> if (w)
> do
> {
> n++;
> }
> while ((w &= w-1) != 0);
> return n;
>}
>
Yes, Dieter, confirmed - your C-routine is fastest. As you already mentioned,
it's already the same code except parameter passing and using other registers.
That's a mess with msc-inline assembly, not able to pass via register. Thanks
for pointing that out.
Regards,
Gerd
static __forceinline int BitCount8 (BitBoard bb)
{
register UINT32 w;
register int n = 0;
w = LOWBOARD(bb);
if (w) do n++; while ((w &= w-1) != 0);
w = HIGHBOARD(bb);
if (w) do n++; while ((w &= w-1) != 0);
return n;
}
the assembler output of your inlined c-routine, including some prefix
instructions preparing the bitboard and __asm int 3 breakpoint:
0042CC05 81 E6 FE FE FE FE and esi,0FEFEFEFEh
0042CC0B 25 FE FE FE 00 and eax,0FEFEFEh
0042CC10 0B CE or ecx,esi
0042CC12 0B D0 or edx,eax
0042CC14 CC int 3
0042CC15 33 C0 xor eax,eax
0042CC17 85 C9 test ecx,ecx
0042CC19 74 08 je 0042CC23
0042CC1B 8D 71 FF lea esi,[ecx-1]
0042CC1E 40 inc eax
0042CC1F 23 CE and ecx,esi
0042CC21 75 F8 jne 0042CC1B
0042CC23 5F pop edi
0042CC24 5E pop esi
0042CC25 85 D2 test edx,edx
0042CC27 8B CA mov ecx,edx
0042CC29 5B pop ebx
0042CC2A 74 08 je 0042CC34
0042CC2C 8D 51 FF lea edx,[ecx-1]
0042CC2F 40 inc eax
0042CC30 23 CA and ecx,edx
0042CC32 75 F8 jne 0042CC2C
the inlined asm code with unnecessary parameter passing via stack!
0042CCFB 81 E2 FE FE FE FE and edx,0FEFEFEFEh
0042CD01 25 FE FE FE 00 and eax,0FEFEFEh
0042CD06 0B CA or ecx,edx
0042CD08 0B D8 or ebx,eax
0042CD0A 89 4D F8 mov dword ptr [ebp-8],ecx
0042CD0D 89 5D FC mov dword ptr [ebp-4],ebx
0042CD10 CC int 3
0042CD11 8B 4D F8 mov ecx,dword ptr [ebp-8]
0042CD14 33 C0 xor eax,eax
0042CD16 85 C9 test ecx,ecx
0042CD18 74 08 je 0042CD22
0042CD1A 8D 51 FF lea edx,[ecx-1]
0042CD1D 40 inc eax
0042CD1E 23 CA and ecx,edx
0042CD20 75 F8 jne 0042CD1A
0042CD22 8B 4D FC mov ecx,dword ptr [ebp-4]
0042CD25 85 C9 test ecx,ecx
0042CD27 74 08 je 0042CD31
0042CD29 8D 51 FF lea edx,[ecx-1]
0042CD2C 40 inc eax
0042CD2D 23 CA and ecx,edx
0042CD2F 75 F8 jne 0042CD29
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.