Author: Vincent Diepeveen
Date: 05:59:08 02/28/03
Go up one level in this thread
On February 27, 2003 at 15:35:34, Russell Reagan wrote:
Now i putted it in a special file tryx.c:
int board[64];
void branchless(void) {
int target;
do {
int xij=board[target]-5;
target += 8;
if( xij )
target = 64;
} while( target < 64 );
}
output from visual c++ 6.0 sp4 procpack:
cl -O2 -G6 -c tryx.c
dumpbin /disasm tryx.obj
_branchless:
00000000: 51 push ecx
00000001: 8B 44 24 00 mov eax,dword ptr [esp]
00000005: 8D 0C 85 00 00 00 lea ecx,[eax*4]
00
0000000C: 8D 64 24 00 lea esp,[esp]
00000010: 8B 01 mov eax,dword ptr [ecx]
00000012: 83 E8 05 sub eax,5
00000015: 83 C1 20 add ecx,20h
00000018: 85 C0 test eax,eax
0000001A: 75 08 jne 00000024
0000001C: 81 F9 00 01 00 00 cmp ecx,100h
00000022: 7C EC jl 00000010
00000024: 59 pop ecx
00000025: C3 ret
So we see that the 'if( xij )' generated 1 branch which shows
how outdated visual c++ gets nowadays.
visual c++ 7.0 NET creates bugs in DIEP's code and more important (i can
work around the bugs if needed) it 2% slower than visual c++ 6.0 sp4
procpack is for DIEP so i deleted it from my harddisk as far as it allowed me.
regrettably it is behaving like a virus so i still suffer from the remains
when i want to debug.
So i cannot test how the COMMERCIALLY SOLD (betaversions only when
mentionned it is betaversions of compilers and which version it is)
7.0 NET is doing on this sample. But for sure NET 7.0 sucks in the average
case even more than visual c++ 6.0 sp4 procpack.
Next is GCC output of this file:
.file "tryx.c"
.text
.p2align 4,,15
.globl branchless
.type branchless, @function
branchless:
pushl %ebp
movl %esp, %ebp
.p2align 4,,7
.L2:
movl board(,%edx,4), %eax
addl $8, %edx
subl $5, %eax
testl %eax, %eax
movl $64, %eax
cmovne %eax, %edx
cmpl $63, %edx
jle .L2
leave
ret
.size branchless, .-branchless
.comm board,256,32
.ident "GCC: (GNU) 3.3 20021230 (prerelease)"
We see that GCC 3.xx is doing very well on this sample. It has optimized away
a branch by using a cmov* instruction.
I am bad however in reading gcc generated assembly (it looks SO VERY UGLY,
similar to the new PGN format of chessbase) and it seems to me it is
possible that this code can be further optimized. I see no need to put the
board pointer in eax each time. It's using just 2 registers versus very old
MSVC is already using 3.
Means that at the Opteron and Itanium2 and such processors with more than 8
GPRs, the GCC compiler will suck major ass of course. It doesn't even know how
to use more than 2 registers!
But in this example it is doing things *branchless*.
So i can't actually wait for a visual c++ edition to use CMOV* instructions
and using profile info to optimize branches.
So in 1 small example we see both the strength of the new generations of
processors released after 1996 (pentiumpro/klamath and newer) and the
weakness of the software (visual c++ 6.0 despite pentiumpro released
in 1996 already still with service packs not using P6 instructions) and the
general inefficiency of the GNU world who isn't using "640KB should be enough
RAM", but instead still is using the lemma "2 registers will do".
Best regards,
Vincent Diepeveen
diep@xs4all.nl
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.