Computer Chess Club Archives


Search

Terms

Messages

Subject: GCC annihilating VISUAL C++ ==> branchless code in 2003?

Author: Vincent Diepeveen

Date: 05:59:08 02/28/03

Go up one level in this thread


On February 27, 2003 at 15:35:34, Russell Reagan wrote:

Now i putted it in a special file tryx.c:

int board[64];

void branchless(void) {
  int target;
  do {
    int xij=board[target]-5;
    target += 8;
    if( xij )
      target = 64;
  } while( target < 64 );
}

output from visual c++ 6.0 sp4 procpack:

cl -O2 -G6 -c tryx.c
dumpbin /disasm tryx.obj

_branchless:
  00000000: 51                 push        ecx
  00000001: 8B 44 24 00        mov         eax,dword ptr [esp]
  00000005: 8D 0C 85 00 00 00  lea         ecx,[eax*4]
            00
  0000000C: 8D 64 24 00        lea         esp,[esp]
  00000010: 8B 01              mov         eax,dword ptr [ecx]
  00000012: 83 E8 05           sub         eax,5
  00000015: 83 C1 20           add         ecx,20h
  00000018: 85 C0              test        eax,eax
  0000001A: 75 08              jne         00000024
  0000001C: 81 F9 00 01 00 00  cmp         ecx,100h
  00000022: 7C EC              jl          00000010
  00000024: 59                 pop         ecx
  00000025: C3                 ret

So we see that the 'if( xij )' generated 1 branch which shows
how outdated visual c++ gets nowadays.

visual c++ 7.0 NET creates bugs in DIEP's code and more important (i can
work around the bugs if needed) it 2% slower than visual c++ 6.0 sp4
procpack is for DIEP so i deleted it from my harddisk as far as it allowed me.
regrettably it is behaving like a virus so i still suffer from the remains
when i want to debug.

So i cannot test how the COMMERCIALLY SOLD (betaversions only when
mentionned it is betaversions of compilers and which version it is)
7.0 NET is doing on this sample. But for sure NET 7.0 sucks in the average
case even more than visual c++ 6.0 sp4 procpack.

Next is GCC output of this file:

	.file	"tryx.c"
	.text
	.p2align 4,,15
        .globl branchless
	.type	branchless, @function
branchless:
	pushl	%ebp
	movl	%esp, %ebp
	.p2align 4,,7
.L2:
        movl	board(,%edx,4), %eax
	addl	$8, %edx
	subl	$5, %eax
	testl	%eax, %eax
	movl	$64, %eax
	cmovne	%eax, %edx
	cmpl	$63, %edx
	jle	.L2
	leave
	ret
	.size	branchless, .-branchless
	.comm	board,256,32
	.ident	"GCC: (GNU) 3.3 20021230 (prerelease)"

We see that GCC 3.xx is doing very well on this sample. It has optimized away
a branch by using a cmov* instruction.

I am bad however in reading gcc generated assembly (it looks SO VERY UGLY,
similar to the new PGN format of chessbase) and it seems to me it is
possible that this code can be further optimized. I see no need to put the
board pointer in eax each time. It's using just 2 registers versus very old
MSVC is already using 3.

Means that at the Opteron and Itanium2 and such processors with more than 8
GPRs, the GCC compiler will suck major ass of course. It doesn't even know how
to use more than 2 registers!

But in this example it is doing things *branchless*.

So i can't actually wait for a visual c++ edition to use CMOV* instructions
and using profile info to optimize branches.

So in 1 small example we see both the strength of the new generations of
processors released after 1996 (pentiumpro/klamath and newer) and the
weakness of the software (visual c++ 6.0 despite pentiumpro released
in 1996 already still with service packs not using P6 instructions) and the
general inefficiency of the GNU world who isn't using "640KB should be enough
RAM", but instead still is using the lemma "2 registers will do".

Best regards,
Vincent Diepeveen
diep@xs4all.nl















This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.