Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Tables or Calculation

Author: Matt Taylor

Date: 09:33:29 01/19/03

Go up one level in this thread


On January 19, 2003 at 07:24:38, David Rasmussen wrote:

>On January 18, 2003 at 19:12:55, Matt Taylor wrote:
>
>>On January 18, 2003 at 15:57:08, David Rasmussen wrote:
>>
>>>
>>>Mmm. That's weird. My bitboard-based program was as fast if not a little bit
>>>faster with MSVC7 then with MSVC6. I can't believe it's library calls.
>>
>>VC 6 and VC 7 both implement complex 64-bit stuff with library calls. More
>>primitive ops (add, subtract, compare) are handled inline as they ought to be.
>>In the case of constant << count, the compiler -should- be able to generate good
>>64-bit emulation code inline.
>>
>>Back when I was first implementing the LSB scan functions, I compiled the 64-bit
>>binary search algorithm using VC 7. There were several ways to implement a
>>"round" of the search. VC 7 chose the worst way.
>>
>>I'm not terribly suprised, though. 64-bit computation isn't common, and the
>>compiler likely has not been tuned to produce super 64-bit code.
>>
>
>I'm surprised, because from the beginning, MSVC (6 or 7) always produced the
>fastest output for me. Way faster than Borland and GCC and Sun. Intel is about
>as fast. With profiler guided optimization, Intel wins. So either all other
>compilers are doing something even worse, or you are wrong.

I can't speak for the others. I have used Borland and GCC, but I haven't worked
much with their assembly output. (As for Borland, I have -never- worked with its
assembly output.)

I know some things get special-cased. For instance:

; 248  : uint64 testmul(uint32 i, uint32 j)
; 249  : {
; 250  : 	return ((uint64) i * (uint64) j);
; 241  : }

	mov	eax, DWORD PTR _i$[esp-4]
	mul	DWORD PTR _j$[esp-4]

If the compiler didn't know that both were 32-bit, it would make the function
call. The same goes for division and modulo operators.

>>>I don't know, but I guess I should check its generated code.
>>>
>>
>>Probably should. I'm betting they use a library call. If not, they may use a
>>combination of shld (slow) and logical ops to accomplish the shift. The assembly
>>I posted should be ideal for you as it avoids the table (even though the table
>>is small) and runs just as fast. Downside is it doesn't run on old, old machines
>>(like original Pentium/original K6).
>>
>
>For the table version (MSVC7):
>
>; 56   : INLINE BitBoard Mask(Square square) { return mask[square]; }
>
>	mov	ecx, DWORD PTR _square$[esp-4]
>	mov	eax, DWORD PTR ?mask@@3PA_KA[ecx*8]
>	mov	edx, DWORD PTR ?mask@@3PA_KA[ecx*8+4]
>	ret	0
>
>For the shift version:
>
>; 55   : INLINE BitBoard Mask(Square square) { return BitBoard(1) << square; }
>
>	mov	ecx, DWORD PTR _square$[esp-4]
>	mov	eax, 1
>	xor	edx, edx
>	jmp	__allshl
>
>So it seems you are right :)
>
>With Intel C++, I can't seem to find the definition of the inlined Mask() in the
>bitboard.h file, where it was when compiler with MSVC. I don't know where Intel
>puts inlined definitions in assembly output. So I made a dummy file and
>function, and tried to trick Intel C++. I didn't succeed:
>
>;;; 	Square s = E4;
>;;; 	++s;
>;;; 	BitBoard b = Mask(s);
>;;; 	return PopCount(b);
>
>        mov       DWORD PTR [ebp-8], 536870912                  ;8.18
>        mov       DWORD PTR [ebp-4], 0                          ;8.18
>
>It understands that b is constant, and it has calculated it at compile time.
>Making another dummy function ( takes square, return Mask(square); ) reveals
>this:
>
>;;; 	return Mask(s);
>
>        mov       eax, 1                                        ;5.14
>        xor       edx, edx                                      ;5.14
>        mov       ecx, DWORD PTR [esp+4]                        ;5.14
>        call      __allshl                                      ;5.14
>                                ; LOE eax ebx ebp esi edi
>.B1.5:                          ; Preds .B1.1
>        ret                                                     ;5.14
>        ALIGN     4
>
>So Intel uses calls too!
>
>I wonder how much could be gained here!
>
>I will post an assembly programmers challenge in another thread :)
>
>/David

Much can be gained with human insight into problems that can be constrained.
It's just too bad that the compiler couldn't see the assumptions I made (that
only one bit is set so the result can be computed twice and adjusted).

I look forward to your challenge!

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.