Author: Vincent Diepeveen
Date: 07:27:28 02/28/03
Go up one level in this thread
On February 28, 2003 at 09:25:07, Andrzej Nagorko wrote: -03 and unroll nonsense is a lot slower for DIEP than -O2 so i by default do not use it. I used: -O2 -march=athlon -mcpu=athlon If it can't produce it with those options it's a hopeless compiler then of course. >On February 28, 2003 at 08:59:08, Vincent Diepeveen wrote: > ><snip> > >> >>Next is GCC output of this file: >> >> .file "tryx.c" >> .text >> .p2align 4,,15 >> .globl branchless >> .type branchless, @function >>branchless: >> pushl %ebp >> movl %esp, %ebp >> .p2align 4,,7 >>.L2: >> movl board(,%edx,4), %eax >> addl $8, %edx >> subl $5, %eax >> testl %eax, %eax >> movl $64, %eax >> cmovne %eax, %edx >> cmpl $63, %edx >> jle .L2 >> leave >> ret >> .size branchless, .-branchless >> .comm board,256,32 >> .ident "GCC: (GNU) 3.3 20021230 (prerelease)" >> > ><snip> > >>So in 1 small example we see both the strength of the new generations of >>processors released after 1996 (pentiumpro/klamath and newer) and the >>weakness of the software (visual c++ 6.0 despite pentiumpro released >>in 1996 already still with service packs not using P6 instructions) and the >>general inefficiency of the GNU world who isn't using "640KB should be enough >>RAM", but instead still is using the lemma "2 registers will do". >> > > My gcc produces better code: > > .file "tryx.c" > .text > .p2align 4,,15 >.globl branchless > .type branchless,@function >branchless: > pushl %ebx > movl $64, %ecx > movl $board, %ebx > .p2align 4,,7 >.L2: > movl (%ebx,%edx,4), %eax > addl $8, %edx > subl $5, %eax > testl %eax, %eax > cmovne %ecx, %edx > cmpl $63, %edx > jle .L2 > popl %ebx > ret >.Lfe1: > .size branchless,.Lfe1-branchless > .comm board,256,32 > .ident "GCC: (GNU) 3.2.3 20030221 (Debian prerelease)" > > As you see it uses three registers (and doesn't do movl $64, %eax inside >loop). Either it is difference between gcc 3.2.3 and 3.3 or you didn't use >proper optimization switches. I compiled it with > >gcc -Wall -O3 -fomit-frame-pointer -march=athlon -mcpu=athlon -funroll-loops >-fstrict-aliasing -S tryx.c > >Andrzej
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.