Author: Eugene Nalimov
Date: 23:14:58 11/27/03
Go up one level in this thread
I don't think you need assembler code in Crafty on Opteron. The only functions that can benefit from it are FirstOne()/LastOne(). On Visual C I used instrinsics that expands to bsf/bsr, but IIRC gain was ~1%. Thanks, Eugene On November 27, 2003 at 13:48:22, Robert Hyatt wrote: >On November 27, 2003 at 05:29:43, Gerd Isenberg wrote: > >>On November 26, 2003 at 21:18:05, Robert Hyatt wrote: >> >>>I have been converting the X86.s file to work in 64 bit mode on the Opteron >>>system I am playing with. And I must say that after studying the Opteron >>>64 bit instruction set, I'm impressed. >>> >>>First, all the old opcodes work.. mov, sub, bsf, etc.. >>> >>>Second, the familiar 8 32-bit regs are still there. But they can be >>>named %rax rather than %eax to stretch them to 64 bits. Cute. And >>>then there are 8 more registers you can use with the same old opcodes >>>and addressing modes. >>> >>>In short, it's well-thought-out and very easy to use. I'll post some >>>performance later. I have PopCnt(), FirstOne() and LastOne() working >>>fine. After I finish the others, I'll see how much (if any) it speeds >>>things up. >> >>Hi Bob, >> >>Very curious about your 64-bit results... > >First results are not good. FirstOne() and LastOne() in asm are actually >slower than the normal table-lookup in the C source. I would suspect it has >to do with (a) bigger cache; (b) bsf/bsr are not fast; (c) the C can be >inlined while the asm is coded as external procedures... > >> >>I see good chances for Matt Taylor's de Bruijn multiplication to become faster >>than a single bsf on AMD64, because bsf reg64 is still 9-cycle vector path >>instruction, but 32*32=64bit or 64*64=128bit became direct path intructions on >>AMD64 (3/5 cycles). May be you can try it some day... > >If you have anything that runs under linux, I can run it on this box easily. > > > > >> >>The additional 8-gp registers r8-r15 (64-bit) or r8D-r15D (32-bit) may be even >>used as addressing registers (with REX prefix). >> >>Be aware of the "signed extension" penalty if using signed 32-bit variables as >>array indicies, "long" is still 32-bit with msc but 64-bit in gcc: > >I know. I had to experiment a bit. Pointers = 64 bits, ints=32 bits, >longs=64 bits, using the gcc 3.2 compiler distributed on this Suse linux >distribution AMD is running on the machine I am playing with. > > > > >> >>Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™ >> >>2.22 Using Unsigned Integers for 32-Bit Array Indices >> >>Optimization >>When using a 32-bit variable as an array index, declare the variable as an >>unsigned integer instead of a signed integer. >> >>Application >>This optimization applies to 64-bit software. >> >>Rationale >>When performing 64-bit address arithmetic, the compiler must insert an >>additional instruction into the object code to sign-extend a signed 32-bit >>integer, which reduces performance; no additional instruction is necessary to >>zero-extend an unsigned 32-bit integer because the processor performs >>zero-extension automatically. (There is no performance penalty for using 64-bit >>variables—either signed or unsigned—as array indices.) >> >>Cheers, >>Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.