Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: An Opteron note

Author: Robert Hyatt

Date: 10:48:22 11/27/03

Go up one level in this thread


On November 27, 2003 at 05:29:43, Gerd Isenberg wrote:

>On November 26, 2003 at 21:18:05, Robert Hyatt wrote:
>
>>I have been converting the X86.s file to work in 64 bit mode on the Opteron
>>system I am playing with.  And I must say that after studying the Opteron
>>64 bit instruction set, I'm impressed.
>>
>>First, all the old opcodes work..   mov, sub, bsf, etc..
>>
>>Second, the familiar 8 32-bit regs are still there.  But they can be
>>named %rax rather than %eax to stretch them to 64 bits.  Cute.  And
>>then there are 8 more registers you can use with the same old opcodes
>>and addressing modes.
>>
>>In short, it's well-thought-out and very easy to use.  I'll post some
>>performance later.  I have PopCnt(), FirstOne() and LastOne() working
>>fine.  After I finish the others, I'll see how much (if any) it speeds
>>things up.
>
>Hi Bob,
>
>Very curious about your 64-bit results...

First results are not good.  FirstOne() and LastOne() in asm are actually
slower than the normal table-lookup in the C source.  I would suspect it has
to do with (a) bigger cache; (b) bsf/bsr are not fast; (c) the C can be
inlined while the asm is coded as external procedures...

>
>I see good chances for Matt Taylor's de Bruijn multiplication to become faster
>than a single bsf on AMD64, because bsf reg64 is still 9-cycle vector path
>instruction, but 32*32=64bit or 64*64=128bit became direct path intructions on
>AMD64 (3/5 cycles). May be you can try it some day...

If you have anything that runs under linux, I can run it on this box easily.




>
>The additional 8-gp registers r8-r15 (64-bit) or r8D-r15D (32-bit) may be even
>used as addressing registers (with REX prefix).
>
>Be aware of the "signed extension" penalty if using signed 32-bit variables as
>array indicies, "long" is still 32-bit with msc but 64-bit in gcc:

I know.  I had to experiment a bit.  Pointers = 64 bits, ints=32 bits,
longs=64 bits, using the gcc 3.2 compiler distributed on this Suse linux
distribution AMD is running on the machine I am playing with.




>
>Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™
>
>2.22 Using Unsigned Integers for 32-Bit Array Indices
>
>Optimization
>When using a 32-bit variable as an array index, declare the variable as an
>unsigned integer instead of a signed integer.
>
>Application
>This optimization applies to 64-bit software.
>
>Rationale
>When performing 64-bit address arithmetic, the compiler must insert an
>additional instruction into the object code to sign-extend a signed 32-bit
>integer, which reduces performance; no additional instruction is necessary to
>zero-extend an unsigned 32-bit integer because the processor performs
>zero-extension automatically. (There is no performance penalty for using 64-bit
>variables—either signed or unsigned—as array indices.)
>
>Cheers,
>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.