Author: Robert Hyatt
Date: 07:14:53 11/28/03
Go up one level in this thread
On November 28, 2003 at 02:14:58, Eugene Nalimov wrote: >I don't think you need assembler code in Crafty on Opteron. The only functions >that can benefit from it are FirstOne()/LastOne(). On Visual C I used >instrinsics that expands to bsf/bsr, but IIRC gain was ~1%. > >Thanks, >Eugene I may try an inline FirstOne()/LastOne() for fun, since they are very easy to write on an opteron. But the separate versions certainly didn't help any... > >On November 27, 2003 at 13:48:22, Robert Hyatt wrote: > >>On November 27, 2003 at 05:29:43, Gerd Isenberg wrote: >> >>>On November 26, 2003 at 21:18:05, Robert Hyatt wrote: >>> >>>>I have been converting the X86.s file to work in 64 bit mode on the Opteron >>>>system I am playing with. And I must say that after studying the Opteron >>>>64 bit instruction set, I'm impressed. >>>> >>>>First, all the old opcodes work.. mov, sub, bsf, etc.. >>>> >>>>Second, the familiar 8 32-bit regs are still there. But they can be >>>>named %rax rather than %eax to stretch them to 64 bits. Cute. And >>>>then there are 8 more registers you can use with the same old opcodes >>>>and addressing modes. >>>> >>>>In short, it's well-thought-out and very easy to use. I'll post some >>>>performance later. I have PopCnt(), FirstOne() and LastOne() working >>>>fine. After I finish the others, I'll see how much (if any) it speeds >>>>things up. >>> >>>Hi Bob, >>> >>>Very curious about your 64-bit results... >> >>First results are not good. FirstOne() and LastOne() in asm are actually >>slower than the normal table-lookup in the C source. I would suspect it has >>to do with (a) bigger cache; (b) bsf/bsr are not fast; (c) the C can be >>inlined while the asm is coded as external procedures... >> >>> >>>I see good chances for Matt Taylor's de Bruijn multiplication to become faster >>>than a single bsf on AMD64, because bsf reg64 is still 9-cycle vector path >>>instruction, but 32*32=64bit or 64*64=128bit became direct path intructions on >>>AMD64 (3/5 cycles). May be you can try it some day... >> >>If you have anything that runs under linux, I can run it on this box easily. >> >> >> >> >>> >>>The additional 8-gp registers r8-r15 (64-bit) or r8D-r15D (32-bit) may be even >>>used as addressing registers (with REX prefix). >>> >>>Be aware of the "signed extension" penalty if using signed 32-bit variables as >>>array indicies, "long" is still 32-bit with msc but 64-bit in gcc: >> >>I know. I had to experiment a bit. Pointers = 64 bits, ints=32 bits, >>longs=64 bits, using the gcc 3.2 compiler distributed on this Suse linux >>distribution AMD is running on the machine I am playing with. >> >> >> >> >>> >>>Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™ >>> >>>2.22 Using Unsigned Integers for 32-Bit Array Indices >>> >>>Optimization >>>When using a 32-bit variable as an array index, declare the variable as an >>>unsigned integer instead of a signed integer. >>> >>>Application >>>This optimization applies to 64-bit software. >>> >>>Rationale >>>When performing 64-bit address arithmetic, the compiler must insert an >>>additional instruction into the object code to sign-extend a signed 32-bit >>>integer, which reduces performance; no additional instruction is necessary to >>>zero-extend an unsigned 32-bit integer because the processor performs >>>zero-extension automatically. (There is no performance penalty for using 64-bit >>>variables—either signed or unsigned—as array indices.) >>> >>>Cheers, >>>Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.