Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: An Opteron note

Author: Eugene Nalimov

Date: 23:14:58 11/27/03

Go up one level in this thread


I don't think you need assembler code in Crafty on Opteron. The only functions
that can benefit from it are FirstOne()/LastOne(). On Visual C I used
instrinsics that expands to bsf/bsr, but IIRC gain was ~1%.

Thanks,
Eugene

On November 27, 2003 at 13:48:22, Robert Hyatt wrote:

>On November 27, 2003 at 05:29:43, Gerd Isenberg wrote:
>
>>On November 26, 2003 at 21:18:05, Robert Hyatt wrote:
>>
>>>I have been converting the X86.s file to work in 64 bit mode on the Opteron
>>>system I am playing with.  And I must say that after studying the Opteron
>>>64 bit instruction set, I'm impressed.
>>>
>>>First, all the old opcodes work..   mov, sub, bsf, etc..
>>>
>>>Second, the familiar 8 32-bit regs are still there.  But they can be
>>>named %rax rather than %eax to stretch them to 64 bits.  Cute.  And
>>>then there are 8 more registers you can use with the same old opcodes
>>>and addressing modes.
>>>
>>>In short, it's well-thought-out and very easy to use.  I'll post some
>>>performance later.  I have PopCnt(), FirstOne() and LastOne() working
>>>fine.  After I finish the others, I'll see how much (if any) it speeds
>>>things up.
>>
>>Hi Bob,
>>
>>Very curious about your 64-bit results...
>
>First results are not good.  FirstOne() and LastOne() in asm are actually
>slower than the normal table-lookup in the C source.  I would suspect it has
>to do with (a) bigger cache; (b) bsf/bsr are not fast; (c) the C can be
>inlined while the asm is coded as external procedures...
>
>>
>>I see good chances for Matt Taylor's de Bruijn multiplication to become faster
>>than a single bsf on AMD64, because bsf reg64 is still 9-cycle vector path
>>instruction, but 32*32=64bit or 64*64=128bit became direct path intructions on
>>AMD64 (3/5 cycles). May be you can try it some day...
>
>If you have anything that runs under linux, I can run it on this box easily.
>
>
>
>
>>
>>The additional 8-gp registers r8-r15 (64-bit) or r8D-r15D (32-bit) may be even
>>used as addressing registers (with REX prefix).
>>
>>Be aware of the "signed extension" penalty if using signed 32-bit variables as
>>array indicies, "long" is still 32-bit with msc but 64-bit in gcc:
>
>I know.  I had to experiment a bit.  Pointers = 64 bits, ints=32 bits,
>longs=64 bits, using the gcc 3.2 compiler distributed on this Suse linux
>distribution AMD is running on the machine I am playing with.
>
>
>
>
>>
>>Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™
>>
>>2.22 Using Unsigned Integers for 32-Bit Array Indices
>>
>>Optimization
>>When using a 32-bit variable as an array index, declare the variable as an
>>unsigned integer instead of a signed integer.
>>
>>Application
>>This optimization applies to 64-bit software.
>>
>>Rationale
>>When performing 64-bit address arithmetic, the compiler must insert an
>>additional instruction into the object code to sign-extend a signed 32-bit
>>integer, which reduces performance; no additional instruction is necessary to
>>zero-extend an unsigned 32-bit integer because the processor performs
>>zero-extension automatically. (There is no performance penalty for using 64-bit
>>variables—either signed or unsigned—as array indices.)
>>
>>Cheers,
>>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.