Author: Andrzej Nagorko
Date: 06:25:07 02/28/03
Go up one level in this thread
On February 28, 2003 at 08:59:08, Vincent Diepeveen wrote:
<snip>
>
>Next is GCC output of this file:
>
> .file "tryx.c"
> .text
> .p2align 4,,15
> .globl branchless
> .type branchless, @function
>branchless:
> pushl %ebp
> movl %esp, %ebp
> .p2align 4,,7
>.L2:
> movl board(,%edx,4), %eax
> addl $8, %edx
> subl $5, %eax
> testl %eax, %eax
> movl $64, %eax
> cmovne %eax, %edx
> cmpl $63, %edx
> jle .L2
> leave
> ret
> .size branchless, .-branchless
> .comm board,256,32
> .ident "GCC: (GNU) 3.3 20021230 (prerelease)"
>
<snip>
>So in 1 small example we see both the strength of the new generations of
>processors released after 1996 (pentiumpro/klamath and newer) and the
>weakness of the software (visual c++ 6.0 despite pentiumpro released
>in 1996 already still with service packs not using P6 instructions) and the
>general inefficiency of the GNU world who isn't using "640KB should be enough
>RAM", but instead still is using the lemma "2 registers will do".
>
My gcc produces better code:
.file "tryx.c"
.text
.p2align 4,,15
.globl branchless
.type branchless,@function
branchless:
pushl %ebx
movl $64, %ecx
movl $board, %ebx
.p2align 4,,7
.L2:
movl (%ebx,%edx,4), %eax
addl $8, %edx
subl $5, %eax
testl %eax, %eax
cmovne %ecx, %edx
cmpl $63, %edx
jle .L2
popl %ebx
ret
.Lfe1:
.size branchless,.Lfe1-branchless
.comm board,256,32
.ident "GCC: (GNU) 3.2.3 20030221 (Debian prerelease)"
As you see it uses three registers (and doesn't do movl $64, %eax inside
loop). Either it is difference between gcc 3.2.3 and 3.3 or you didn't use
proper optimization switches. I compiled it with
gcc -Wall -O3 -fomit-frame-pointer -march=athlon -mcpu=athlon -funroll-loops
-fstrict-aliasing -S tryx.c
Andrzej
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.