Author: Robert Hyatt
Date: 10:48:22 11/27/03
Go up one level in this thread
On November 27, 2003 at 05:29:43, Gerd Isenberg wrote: >On November 26, 2003 at 21:18:05, Robert Hyatt wrote: > >>I have been converting the X86.s file to work in 64 bit mode on the Opteron >>system I am playing with. And I must say that after studying the Opteron >>64 bit instruction set, I'm impressed. >> >>First, all the old opcodes work.. mov, sub, bsf, etc.. >> >>Second, the familiar 8 32-bit regs are still there. But they can be >>named %rax rather than %eax to stretch them to 64 bits. Cute. And >>then there are 8 more registers you can use with the same old opcodes >>and addressing modes. >> >>In short, it's well-thought-out and very easy to use. I'll post some >>performance later. I have PopCnt(), FirstOne() and LastOne() working >>fine. After I finish the others, I'll see how much (if any) it speeds >>things up. > >Hi Bob, > >Very curious about your 64-bit results... First results are not good. FirstOne() and LastOne() in asm are actually slower than the normal table-lookup in the C source. I would suspect it has to do with (a) bigger cache; (b) bsf/bsr are not fast; (c) the C can be inlined while the asm is coded as external procedures... > >I see good chances for Matt Taylor's de Bruijn multiplication to become faster >than a single bsf on AMD64, because bsf reg64 is still 9-cycle vector path >instruction, but 32*32=64bit or 64*64=128bit became direct path intructions on >AMD64 (3/5 cycles). May be you can try it some day... If you have anything that runs under linux, I can run it on this box easily. > >The additional 8-gp registers r8-r15 (64-bit) or r8D-r15D (32-bit) may be even >used as addressing registers (with REX prefix). > >Be aware of the "signed extension" penalty if using signed 32-bit variables as >array indicies, "long" is still 32-bit with msc but 64-bit in gcc: I know. I had to experiment a bit. Pointers = 64 bits, ints=32 bits, longs=64 bits, using the gcc 3.2 compiler distributed on this Suse linux distribution AMD is running on the machine I am playing with. > >Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™ > >2.22 Using Unsigned Integers for 32-Bit Array Indices > >Optimization >When using a 32-bit variable as an array index, declare the variable as an >unsigned integer instead of a signed integer. > >Application >This optimization applies to 64-bit software. > >Rationale >When performing 64-bit address arithmetic, the compiler must insert an >additional instruction into the object code to sign-extend a signed 32-bit >integer, which reduces performance; no additional instruction is necessary to >zero-extend an unsigned 32-bit integer because the processor performs >zero-extension automatically. (There is no performance penalty for using 64-bit >variables—either signed or unsigned—as array indices.) > >Cheers, >Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.