Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: GCC options

Author: Vincent Diepeveen

Date: 07:27:28 02/28/03

Go up one level in this thread


On February 28, 2003 at 09:25:07, Andrzej Nagorko wrote:

-03 and unroll nonsense is a lot slower for DIEP
than -O2 so i by default do not use it.

I used:

  -O2 -march=athlon -mcpu=athlon

If it can't produce it with those options it's a hopeless compiler then of
course.

>On February 28, 2003 at 08:59:08, Vincent Diepeveen wrote:
>
><snip>
>
>>
>>Next is GCC output of this file:
>>
>>	.file	"tryx.c"
>>	.text
>>	.p2align 4,,15
>>        .globl branchless
>>	.type	branchless, @function
>>branchless:
>>	pushl	%ebp
>>	movl	%esp, %ebp
>>	.p2align 4,,7
>>.L2:
>>        movl	board(,%edx,4), %eax
>>	addl	$8, %edx
>>	subl	$5, %eax
>>	testl	%eax, %eax
>>	movl	$64, %eax
>>	cmovne	%eax, %edx
>>	cmpl	$63, %edx
>>	jle	.L2
>>	leave
>>	ret
>>	.size	branchless, .-branchless
>>	.comm	board,256,32
>>	.ident	"GCC: (GNU) 3.3 20021230 (prerelease)"
>>
>
><snip>
>
>>So in 1 small example we see both the strength of the new generations of
>>processors released after 1996 (pentiumpro/klamath and newer) and the
>>weakness of the software (visual c++ 6.0 despite pentiumpro released
>>in 1996 already still with service packs not using P6 instructions) and the
>>general inefficiency of the GNU world who isn't using "640KB should be enough
>>RAM", but instead still is using the lemma "2 registers will do".
>>
>
>  My gcc produces better code:
>
>        .file   "tryx.c"
>        .text
>        .p2align 4,,15
>.globl branchless
>        .type   branchless,@function
>branchless:
>        pushl   %ebx
>        movl    $64, %ecx
>        movl    $board, %ebx
>        .p2align 4,,7
>.L2:
>        movl    (%ebx,%edx,4), %eax
>        addl    $8, %edx
>        subl    $5, %eax
>        testl   %eax, %eax
>        cmovne  %ecx, %edx
>        cmpl    $63, %edx
>        jle     .L2
>        popl    %ebx
>        ret
>.Lfe1:
>        .size   branchless,.Lfe1-branchless
>        .comm   board,256,32
>        .ident  "GCC: (GNU) 3.2.3 20030221 (Debian prerelease)"
>
>  As you see it uses three registers (and doesn't do movl $64, %eax inside
>loop). Either it is difference between gcc 3.2.3 and 3.3 or you didn't use
>proper optimization switches. I compiled it with
>
>gcc -Wall -O3 -fomit-frame-pointer -march=athlon -mcpu=athlon -funroll-loops
>-fstrict-aliasing -S tryx.c
>
>Andrzej



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.