Author: David Rasmussen
Date: 04:24:38 01/19/03
Go up one level in this thread
On January 18, 2003 at 19:12:55, Matt Taylor wrote: >On January 18, 2003 at 15:57:08, David Rasmussen wrote: > >> >>Mmm. That's weird. My bitboard-based program was as fast if not a little bit >>faster with MSVC7 then with MSVC6. I can't believe it's library calls. > >VC 6 and VC 7 both implement complex 64-bit stuff with library calls. More >primitive ops (add, subtract, compare) are handled inline as they ought to be. >In the case of constant << count, the compiler -should- be able to generate good >64-bit emulation code inline. > >Back when I was first implementing the LSB scan functions, I compiled the 64-bit >binary search algorithm using VC 7. There were several ways to implement a >"round" of the search. VC 7 chose the worst way. > >I'm not terribly suprised, though. 64-bit computation isn't common, and the >compiler likely has not been tuned to produce super 64-bit code. > I'm surprised, because from the beginning, MSVC (6 or 7) always produced the fastest output for me. Way faster than Borland and GCC and Sun. Intel is about as fast. With profiler guided optimization, Intel wins. So either all other compilers are doing something even worse, or you are wrong. >> >>I don't know, but I guess I should check it's generated code. >> > >Probably should. I'm betting they use a library call. If not, they may use a >combination of shld (slow) and logical ops to accomplish the shift. The assembly >I posted should be ideal for you as it avoids the table (even though the table >is small) and runs just as fast. Downside is it doesn't run on old, old machines >(like original Pentium/original K6). > For the table version (MSVC7): ; 56 : INLINE BitBoard Mask(Square square) { return mask[square]; } mov ecx, DWORD PTR _square$[esp-4] mov eax, DWORD PTR ?mask@@3PA_KA[ecx*8] mov edx, DWORD PTR ?mask@@3PA_KA[ecx*8+4] ret 0 For the shift version: ; 55 : INLINE BitBoard Mask(Square square) { return BitBoard(1) << square; } mov ecx, DWORD PTR _square$[esp-4] mov eax, 1 xor edx, edx jmp __allshl So it seems you are right :) With Intel C++, I can't seem to find the definition of the inlined Mask() in the bitboard.h file, where it was when compiler with MSVC. I don't know where Intel puts inlined definitions in assembly output. So I made a dummy file and function, and tried to trick Intel C++. I didn't succeed: ;;; Square s = E4; ;;; ++s; ;;; BitBoard b = Mask(s); ;;; return PopCount(b); mov DWORD PTR [ebp-8], 536870912 ;8.18 mov DWORD PTR [ebp-4], 0 ;8.18 It understands that b is constant, and it has calculated it at compile time. Making another dummy function ( takes square, return Mask(square); ) reveals this: ;;; return Mask(s); mov eax, 1 ;5.14 xor edx, edx ;5.14 mov ecx, DWORD PTR [esp+4] ;5.14 call __allshl ;5.14 ; LOE eax ebx ebp esi edi .B1.5: ; Preds .B1.1 ret ;5.14 ALIGN 4 So Intel uses calls too! I wonder how much could be gained here! I will post an assembly programmers challenge in another thread :) /David
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.