Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: FirstOne/LastOne

Author: Dieter Buerssner

Date: 14:14:01 06/15/04

Go up one level in this thread


On June 15, 2004 at 05:56:05, Gian-Carlo Pascutto wrote:

>On June 14, 2004 at 16:15:17, Dieter Buerssner wrote:
>
>>I collected some suggested routines. See for the source code later. My results
>>on P4 2.53 GHz:
>>
>>With Gcc 3.1
>>
>>Testing static __inline__ FindFirst_GCP
>>sum:  89.909 s for 805305600 calls, 111.646 ns/call
>>With MSVC 7.0 free command line tools
>>Testing static __forceinline FindFirst_GCP
>>sum: 159.459 s for 805305600 calls, 198.011 ns/call
>
>Oi oi oi...seems like 64 bit handling by 32 bit compilers
>still sucks.

Gcc (and ICC) produce reasonable code. Just a bit of shuffling of registers
could be avoided from first sight. Gcc -O2 produces this

_foo:
        pushl   %ebp
        xorl    %ebp, %ebp
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        subl    $20, %esp
        movl    $0, 12(%esp)
        .p2align 4,,15
L108:
        movl    _bbs(,%ebp,8), %eax
        movl    _bbs+4(,%ebp,8), %edx
        movl    %eax, %ecx
        negl    %ecx
        movl    %edx, %ebx
        movl    %eax, %esi
        adcl    $0, %ebx
        movl    $-1829284147, %eax
        andl    %ecx, %esi
        movl    %esi, (%esp)
        negl    %ebx
        incl    %ebp
        movl    %edx, %esi
        movl    (%esp), %edi
        andl    %ebx, %esi
        movl    %esi, 4(%esp)
        imull   $66096211, %edi, %edi
        mull    (%esp)
        movl    4(%esp), %eax
        imull   $-1829284147, %eax, %eax
        movl    %edx, %esi
        addl    %edi, %esi
        leal    (%eax,%esi), %ebx
        movl    %ebx, %ecx
        shrl    $26, %ecx
        movzbl  _magictable(%ecx), %eax
        addl    %eax, 12(%esp)
        cmpl    $99, %ebp
        jle     L108
        movl    12(%esp), %eax
        addl    $20, %esp
        popl    %ebx
        popl    %esi
        popl    %edi
        popl    %ebp
        ret

with inlining from

unsigned long foo(void)
{
   int i;
   unsigned long n=0;
   for (i=0; i<NBB; i++)
     n += testfunc(bbs[i]);
   return n;
}

Speeding this up over 30% for handtuned assembler seems not easy. So, IMHO, the
compiler does a good job here.

With the manual multiplication the code is a bit better, but not so much faster:

        pushl   %ebp
        movl    _magicl, %ebp
        pushl   %edi
        xorl    %edi, %edi
        pushl   %esi
        xorl    %esi, %esi
        pushl   %ebx
        .p2align 4,,15
L108:
        movl    _bbs(,%esi,8), %eax
        movl    _bbs+4(,%esi,8), %edx
        movl    %eax, %ecx
        negl    %ecx
        movl    %edx, %ebx
        adcl    $0, %ebx
        negl    %ebx
        andl    %ecx, %eax
        andl    %ebx, %edx
        movl    %eax, %ecx
        movl    %edx, %ebx
        mull    %ebp
        incl    %esi
        imull   $66096211, %ecx, %ecx
        imull   $-1829284147, %ebx, %ebx
        addl    %ebx, %ecx
        addl    %edx, %ecx
        shrl    $26, %ecx
        movzbl  _magictable(%ecx), %eax
        addl    %eax, %edi
        cmpl    $99, %esi
        jle     L108
        popl    %ebx
        movl    %edi, %eax
        popl    %esi
        popl    %edi
        popl    %ebp
        ret

I see no way to convince MSVC to produce inline code for the multiplication, it
always calls that __allmul function (it has no problem to produce good code for
32-bit*32-bit -> 64-bit multiplications). The relative slowness seems just to
come from those 3 multiplications for Gcc. Perhaps there are some magic numbers
where doing the multiplication by the constant with the typical
shift/add/sub/lea tricks would be faster. But I doubt it.

Regards,
Dieter




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.