Author: Gerd Isenberg
Date: 06:44:30 04/21/03
Go up one level in this thread
On April 19, 2003 at 12:34:22, Dieter Buerssner wrote: >On April 19, 2003 at 05:29:13, Gerd Isenberg wrote: > >>Yes Dieter, i tried a lot. > >Gerd, I have no doubt about this. I only questioned the 2 specific routines (the >loops on the 2 32 bit parts, once in assembler, once in C). I questioned it >only, because I looked at the generated assembler of the C-Code, which was >practically identical, to the assembly you showed. > >Anyway, I made a small test program. dieter() is the C-Code I had shown, gerd() >your assembly (I wrote the same in GCC-style, too), and crafty() the routine Uri >showed in this thread. In my test, gerd() was not faster than dieter() (dieter() >was actually slightly faster than gerd() using MSVC, and the same using gcc). As >should be expected, with this specific test, crafty() was a bit slower (much >more with gcc than with MSVC). I used -O2 for both compilers. No doubt, one can >argue about my testing procedure. I actually made it out of fun only, not to do >some scientific experiments. I also used a union for bitboard, and changed for >this the functions in the thread slightly. I just did not want to try which >tricks would work best to convince the compiler to extract the 2 32-bit parts >efficiently. > >Regards, >Dieter > >MYINLINE int dieter(BU a) >{ > unsigned long w; > int n = 0; > w = a.w[0]; > if (w) > do > { > n++; > } > while ((w &= w-1) != 0); > w = a.w[1]; > if (w) > do > { > n++; > } > while ((w &= w-1) != 0); > return n; >} > Yes, Dieter, confirmed - your C-routine is fastest. As you already mentioned, it's already the same code except parameter passing and using other registers. That's a mess with msc-inline assembly, not able to pass via register. Thanks for pointing that out. Regards, Gerd static __forceinline int BitCount8 (BitBoard bb) { register UINT32 w; register int n = 0; w = LOWBOARD(bb); if (w) do n++; while ((w &= w-1) != 0); w = HIGHBOARD(bb); if (w) do n++; while ((w &= w-1) != 0); return n; } the assembler output of your inlined c-routine, including some prefix instructions preparing the bitboard and __asm int 3 breakpoint: 0042CC05 81 E6 FE FE FE FE and esi,0FEFEFEFEh 0042CC0B 25 FE FE FE 00 and eax,0FEFEFEh 0042CC10 0B CE or ecx,esi 0042CC12 0B D0 or edx,eax 0042CC14 CC int 3 0042CC15 33 C0 xor eax,eax 0042CC17 85 C9 test ecx,ecx 0042CC19 74 08 je 0042CC23 0042CC1B 8D 71 FF lea esi,[ecx-1] 0042CC1E 40 inc eax 0042CC1F 23 CE and ecx,esi 0042CC21 75 F8 jne 0042CC1B 0042CC23 5F pop edi 0042CC24 5E pop esi 0042CC25 85 D2 test edx,edx 0042CC27 8B CA mov ecx,edx 0042CC29 5B pop ebx 0042CC2A 74 08 je 0042CC34 0042CC2C 8D 51 FF lea edx,[ecx-1] 0042CC2F 40 inc eax 0042CC30 23 CA and ecx,edx 0042CC32 75 F8 jne 0042CC2C the inlined asm code with unnecessary parameter passing via stack! 0042CCFB 81 E2 FE FE FE FE and edx,0FEFEFEFEh 0042CD01 25 FE FE FE 00 and eax,0FEFEFEh 0042CD06 0B CA or ecx,edx 0042CD08 0B D8 or ebx,eax 0042CD0A 89 4D F8 mov dword ptr [ebp-8],ecx 0042CD0D 89 5D FC mov dword ptr [ebp-4],ebx 0042CD10 CC int 3 0042CD11 8B 4D F8 mov ecx,dword ptr [ebp-8] 0042CD14 33 C0 xor eax,eax 0042CD16 85 C9 test ecx,ecx 0042CD18 74 08 je 0042CD22 0042CD1A 8D 51 FF lea edx,[ecx-1] 0042CD1D 40 inc eax 0042CD1E 23 CA and ecx,edx 0042CD20 75 F8 jne 0042CD1A 0042CD22 8B 4D FC mov ecx,dword ptr [ebp-4] 0042CD25 85 C9 test ecx,ecx 0042CD27 74 08 je 0042CD31 0042CD29 8D 51 FF lea edx,[ecx-1] 0042CD2C 40 inc eax 0042CD2D 23 CA and ecx,edx 0042CD2F 75 F8 jne 0042CD29
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.