Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Question for the Crafty/Compiler experts

Author: Dieter Buerssner

Date: 16:18:46 02/17/04

Go up one level in this thread


On February 17, 2004 at 16:10:10, Dann Corbit wrote:

How did you create the source? Just the ouutput (more or less) of the pre
processor run? It looks like it, but you still have the #include(s), which would
not be in that output.

Anyway, your assembly seems to strengthen my point. It does not look worse, than
the "hand tuned" assembly code from (former versions of) Crafty.

>_bitfiddling PROC NEAR					; COMDAT
>
>; 27   :     unsigned long   u,
>; 28   :                     v;
>; 29   :     v = bb.w[1];
>; 30   :     u = bb.w[0];
>; 31   :     u = (u&0x55555555) + ((u>>1)&0x55555555);
>
>	mov	eax, DWORD PTR _bb$[esp-4]
>	mov	ecx, eax
>	and	eax, 1431655765				; 55555555H
>	shr	ecx, 1
>	and	ecx, 1431655765				; 55555555H
>	add	ecx, eax
>
>; 32   :     v = (v&0x55555555) + ((v>>1)&0x55555555);
>
>	mov	eax, DWORD PTR _bb$[esp]
>	mov	edx, eax
>	and	eax, 1431655765				; 55555555H
>	shr	edx, 1
>	and	edx, 1431655765				; 55555555H
>	add	edx, eax
>
>; 33   :     u = (u & 0x33333333) + ((u >> 2) & 0x33333333)
>; 34   :         + (v & 0x33333333) + ((v >> 2) & 0x33333333);
>
>	mov	eax, ecx
>	shr	eax, 2
>	and	eax, 858993459				; 33333333H
>	push	esi
>	mov	esi, edx
>	shr	esi, 2
>	and	ecx, 858993459				; 33333333H
>	and	esi, 858993459				; 33333333H
>	add	eax, esi
>	add	eax, ecx
>	and	edx, 858993459				; 33333333H
>	add	eax, edx
>
>; 35   :     u = (u&0x0f0f0f0f) + ((u>>4)&0x0f0f0f0f);
>
>	mov	ecx, eax
>	and	eax, 252645135				; 0f0f0f0fH
>	shr	ecx, 4
>	and	ecx, 252645135				; 0f0f0f0fH
>	add	ecx, eax
>
>; 36   :     u += u >> 8;
>
>	mov	eax, ecx
>	shr	eax, 8
>	add	ecx, eax
>
>; 37   :     u += u >> 16;
>
>	mov	eax, ecx
>	shr	eax, 16					; 00000010H
>	add	eax, ecx
>
>; 38   :     return u & 0xff;
>
>	and	eax, 255				; 000000ffH
>	pop	esi
>
>; 39   : }
>
>	ret	0

Less instructions here, than in x86.s of older crafties, that used the same idea
 (and this code will probably run faster). So - why use assembly for it?

>_bitfiddling ENDP
>; Function compile flags: /Ogty
>_TEXT	ENDS
>;	COMDAT _dieter_popc
>_TEXT	SEGMENT
>_a$ = 8							; size = 8
>_dieter_popc PROC NEAR					; COMDAT
>
>; 10   :     unsigned long   w;
>; 11   :     int             n = 0;
>; 12   :     w = a.w[0];
>
>	mov	ecx, DWORD PTR _a$[esp-4]
>	xor	eax, eax
>
>; 13   :     if (w)
>
>	test	ecx, ecx
>	je	SHORT $L552
>	npad	6
>$L550:
>
>; 14   :         do
>; 15   :             n++;
>; 16   :         while ((w &= w - 1) != 0);
>
>	lea	edx, DWORD PTR [ecx-1]
>	inc	eax
>	and	ecx, edx
>	jne	SHORT $L550
>$L552:
>
>; 17   :     w = a.w[1];
>
>	mov	ecx, DWORD PTR _a$[esp]
>
>; 18   :     if (w)
>
>	test	ecx, ecx
>	je	SHORT $L556
>$L554:
>
>; 19   :         do
>; 20   :             n++;
>; 21   :         while ((w &= w - 1) != 0);
>
>	lea	edx, DWORD PTR [ecx-1]
>	inc	eax
>	and	ecx, edx
>	jne	SHORT $L554
>$L556:
>
>; 22   :     return n;
>; 23   : }
>
>	ret	0
>_dieter_popc ENDP

Almost exactly the same, as the inline assembly you posted some messages before.
So, why use assembly for this? I mentioned already, that in many situations, the
C code will probably be faster.

>;=====================================
>; And with SSE2 enabled, we get this:
>;=====================================
[snipped]

I only looked fast over this. I don't see a difference to the above (neither
would I expect any difference). Did you see a difference?

Regards,
Dieter





This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.