Author: Dieter Buerssner
Date: 14:14:01 06/15/04
Go up one level in this thread
On June 15, 2004 at 05:56:05, Gian-Carlo Pascutto wrote:
>On June 14, 2004 at 16:15:17, Dieter Buerssner wrote:
>
>>I collected some suggested routines. See for the source code later. My results
>>on P4 2.53 GHz:
>>
>>With Gcc 3.1
>>
>>Testing static __inline__ FindFirst_GCP
>>sum: 89.909 s for 805305600 calls, 111.646 ns/call
>>With MSVC 7.0 free command line tools
>>Testing static __forceinline FindFirst_GCP
>>sum: 159.459 s for 805305600 calls, 198.011 ns/call
>
>Oi oi oi...seems like 64 bit handling by 32 bit compilers
>still sucks.
Gcc (and ICC) produce reasonable code. Just a bit of shuffling of registers
could be avoided from first sight. Gcc -O2 produces this
_foo:
pushl %ebp
xorl %ebp, %ebp
pushl %edi
pushl %esi
pushl %ebx
subl $20, %esp
movl $0, 12(%esp)
.p2align 4,,15
L108:
movl _bbs(,%ebp,8), %eax
movl _bbs+4(,%ebp,8), %edx
movl %eax, %ecx
negl %ecx
movl %edx, %ebx
movl %eax, %esi
adcl $0, %ebx
movl $-1829284147, %eax
andl %ecx, %esi
movl %esi, (%esp)
negl %ebx
incl %ebp
movl %edx, %esi
movl (%esp), %edi
andl %ebx, %esi
movl %esi, 4(%esp)
imull $66096211, %edi, %edi
mull (%esp)
movl 4(%esp), %eax
imull $-1829284147, %eax, %eax
movl %edx, %esi
addl %edi, %esi
leal (%eax,%esi), %ebx
movl %ebx, %ecx
shrl $26, %ecx
movzbl _magictable(%ecx), %eax
addl %eax, 12(%esp)
cmpl $99, %ebp
jle L108
movl 12(%esp), %eax
addl $20, %esp
popl %ebx
popl %esi
popl %edi
popl %ebp
ret
with inlining from
unsigned long foo(void)
{
int i;
unsigned long n=0;
for (i=0; i<NBB; i++)
n += testfunc(bbs[i]);
return n;
}
Speeding this up over 30% for handtuned assembler seems not easy. So, IMHO, the
compiler does a good job here.
With the manual multiplication the code is a bit better, but not so much faster:
pushl %ebp
movl _magicl, %ebp
pushl %edi
xorl %edi, %edi
pushl %esi
xorl %esi, %esi
pushl %ebx
.p2align 4,,15
L108:
movl _bbs(,%esi,8), %eax
movl _bbs+4(,%esi,8), %edx
movl %eax, %ecx
negl %ecx
movl %edx, %ebx
adcl $0, %ebx
negl %ebx
andl %ecx, %eax
andl %ebx, %edx
movl %eax, %ecx
movl %edx, %ebx
mull %ebp
incl %esi
imull $66096211, %ecx, %ecx
imull $-1829284147, %ebx, %ebx
addl %ebx, %ecx
addl %edx, %ecx
shrl $26, %ecx
movzbl _magictable(%ecx), %eax
addl %eax, %edi
cmpl $99, %esi
jle L108
popl %ebx
movl %edi, %eax
popl %esi
popl %edi
popl %ebp
ret
I see no way to convince MSVC to produce inline code for the multiplication, it
always calls that __allmul function (it has no problem to produce good code for
32-bit*32-bit -> 64-bit multiplications). The relative slowness seems just to
come from those 3 multiplications for Gcc. Perhaps there are some magic numbers
where doing the multiplication by the constant with the typical
shift/add/sub/lea tricks would be faster. But I doubt it.
Regards,
Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.