Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: is this really faster?

Author: Gerd Isenberg

Date: 06:44:30 04/21/03

Go up one level in this thread


On April 19, 2003 at 12:34:22, Dieter Buerssner wrote:

>On April 19, 2003 at 05:29:13, Gerd Isenberg wrote:
>
>>Yes Dieter, i tried a lot.
>
>Gerd, I have no doubt about this. I only questioned the 2 specific routines (the
>loops on the 2 32 bit parts, once in assembler, once in C). I questioned it
>only, because I looked at the generated assembler of the C-Code, which was
>practically identical, to the assembly you showed.
>
>Anyway, I made a small test program. dieter() is the C-Code I had shown, gerd()
>your assembly (I wrote the same in GCC-style, too), and crafty() the routine Uri
>showed in this thread. In my test, gerd() was not faster than dieter() (dieter()
>was actually slightly faster than gerd() using MSVC, and the same using gcc). As
>should be expected, with this specific test, crafty() was a bit slower (much
>more with gcc than with MSVC). I used -O2 for both compilers. No doubt, one can
>argue about my testing procedure. I actually made it out of fun only, not to do
>some scientific experiments. I also used a union for bitboard, and changed for
>this the functions in the thread slightly. I just did not want to try which
>tricks would work best to convince the compiler to extract the 2 32-bit parts
>efficiently.
>
>Regards,
>Dieter
>
>MYINLINE int dieter(BU a)
>{
>  unsigned long w;
>  int n = 0;
>  w = a.w[0];
>  if (w)
>    do
>    {
>      n++;
>    }
>    while ((w &= w-1) != 0);
>  w = a.w[1];
>  if (w)
>    do
>    {
>      n++;
>    }
>    while ((w &= w-1) != 0);
>  return n;
>}
>


Yes, Dieter, confirmed - your C-routine is fastest. As you already mentioned,
it's already the same code except parameter passing and using other registers.
That's a mess with msc-inline assembly, not able to pass via register. Thanks
for pointing that out.

Regards,
Gerd


static __forceinline int BitCount8 (BitBoard bb)
{
	register UINT32 w;
	register int n = 0;
	w = LOWBOARD(bb);
	if (w) do n++; while ((w &= w-1) != 0);
	w = HIGHBOARD(bb);
	if (w) do n++; while ((w &= w-1) != 0);
	return n;
}

the assembler output of your inlined c-routine, including some prefix
instructions preparing the bitboard and __asm int 3 breakpoint:

0042CC05 81 E6 FE FE FE FE    and         esi,0FEFEFEFEh
0042CC0B 25 FE FE FE 00       and         eax,0FEFEFEh
0042CC10 0B CE                or          ecx,esi
0042CC12 0B D0                or          edx,eax
0042CC14 CC                   int         3

0042CC15 33 C0                xor         eax,eax
0042CC17 85 C9                test        ecx,ecx
0042CC19 74 08                je          0042CC23
0042CC1B 8D 71 FF             lea         esi,[ecx-1]
0042CC1E 40                   inc         eax
0042CC1F 23 CE                and         ecx,esi
0042CC21 75 F8                jne         0042CC1B
0042CC23 5F                   pop         edi
0042CC24 5E                   pop         esi
0042CC25 85 D2                test        edx,edx
0042CC27 8B CA                mov         ecx,edx
0042CC29 5B                   pop         ebx
0042CC2A 74 08                je          0042CC34
0042CC2C 8D 51 FF             lea         edx,[ecx-1]
0042CC2F 40                   inc         eax
0042CC30 23 CA                and         ecx,edx
0042CC32 75 F8                jne         0042CC2C


the inlined asm code with unnecessary parameter passing via stack!

0042CCFB 81 E2 FE FE FE FE    and         edx,0FEFEFEFEh
0042CD01 25 FE FE FE 00       and         eax,0FEFEFEh
0042CD06 0B CA                or          ecx,edx
0042CD08 0B D8                or          ebx,eax
0042CD0A 89 4D F8             mov         dword ptr [ebp-8],ecx
0042CD0D 89 5D FC             mov         dword ptr [ebp-4],ebx
0042CD10 CC                   int         3

0042CD11 8B 4D F8             mov         ecx,dword ptr [ebp-8]
0042CD14 33 C0                xor         eax,eax
0042CD16 85 C9                test        ecx,ecx
0042CD18 74 08                je          0042CD22
0042CD1A 8D 51 FF             lea         edx,[ecx-1]
0042CD1D 40                   inc         eax
0042CD1E 23 CA                and         ecx,edx
0042CD20 75 F8                jne         0042CD1A
0042CD22 8B 4D FC             mov         ecx,dword ptr [ebp-4]
0042CD25 85 C9                test        ecx,ecx
0042CD27 74 08                je          0042CD31
0042CD29 8D 51 FF             lea         edx,[ecx-1]
0042CD2C 40                   inc         eax
0042CD2D 23 CA                and         ecx,edx
0042CD2F 75 F8                jne         0042CD29




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.