Author: Vincent Diepeveen
Date: 05:59:08 02/28/03
Go up one level in this thread
On February 27, 2003 at 15:35:34, Russell Reagan wrote: Now i putted it in a special file tryx.c: int board[64]; void branchless(void) { int target; do { int xij=board[target]-5; target += 8; if( xij ) target = 64; } while( target < 64 ); } output from visual c++ 6.0 sp4 procpack: cl -O2 -G6 -c tryx.c dumpbin /disasm tryx.obj _branchless: 00000000: 51 push ecx 00000001: 8B 44 24 00 mov eax,dword ptr [esp] 00000005: 8D 0C 85 00 00 00 lea ecx,[eax*4] 00 0000000C: 8D 64 24 00 lea esp,[esp] 00000010: 8B 01 mov eax,dword ptr [ecx] 00000012: 83 E8 05 sub eax,5 00000015: 83 C1 20 add ecx,20h 00000018: 85 C0 test eax,eax 0000001A: 75 08 jne 00000024 0000001C: 81 F9 00 01 00 00 cmp ecx,100h 00000022: 7C EC jl 00000010 00000024: 59 pop ecx 00000025: C3 ret So we see that the 'if( xij )' generated 1 branch which shows how outdated visual c++ gets nowadays. visual c++ 7.0 NET creates bugs in DIEP's code and more important (i can work around the bugs if needed) it 2% slower than visual c++ 6.0 sp4 procpack is for DIEP so i deleted it from my harddisk as far as it allowed me. regrettably it is behaving like a virus so i still suffer from the remains when i want to debug. So i cannot test how the COMMERCIALLY SOLD (betaversions only when mentionned it is betaversions of compilers and which version it is) 7.0 NET is doing on this sample. But for sure NET 7.0 sucks in the average case even more than visual c++ 6.0 sp4 procpack. Next is GCC output of this file: .file "tryx.c" .text .p2align 4,,15 .globl branchless .type branchless, @function branchless: pushl %ebp movl %esp, %ebp .p2align 4,,7 .L2: movl board(,%edx,4), %eax addl $8, %edx subl $5, %eax testl %eax, %eax movl $64, %eax cmovne %eax, %edx cmpl $63, %edx jle .L2 leave ret .size branchless, .-branchless .comm board,256,32 .ident "GCC: (GNU) 3.3 20021230 (prerelease)" We see that GCC 3.xx is doing very well on this sample. It has optimized away a branch by using a cmov* instruction. I am bad however in reading gcc generated assembly (it looks SO VERY UGLY, similar to the new PGN format of chessbase) and it seems to me it is possible that this code can be further optimized. I see no need to put the board pointer in eax each time. It's using just 2 registers versus very old MSVC is already using 3. Means that at the Opteron and Itanium2 and such processors with more than 8 GPRs, the GCC compiler will suck major ass of course. It doesn't even know how to use more than 2 registers! But in this example it is doing things *branchless*. So i can't actually wait for a visual c++ edition to use CMOV* instructions and using profile info to optimize branches. So in 1 small example we see both the strength of the new generations of processors released after 1996 (pentiumpro/klamath and newer) and the weakness of the software (visual c++ 6.0 despite pentiumpro released in 1996 already still with service packs not using P6 instructions) and the general inefficiency of the GNU world who isn't using "640KB should be enough RAM", but instead still is using the lemma "2 registers will do". Best regards, Vincent Diepeveen diep@xs4all.nl
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.