Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: An Opteron note

Author: Robert Hyatt

Date: 07:14:53 11/28/03

Go up one level in this thread


On November 28, 2003 at 02:14:58, Eugene Nalimov wrote:

>I don't think you need assembler code in Crafty on Opteron. The only functions
>that can benefit from it are FirstOne()/LastOne(). On Visual C I used
>instrinsics that expands to bsf/bsr, but IIRC gain was ~1%.
>
>Thanks,
>Eugene

I may try an inline FirstOne()/LastOne() for fun, since they are very
easy to write on an opteron.  But the separate versions certainly didn't
help any...



>
>On November 27, 2003 at 13:48:22, Robert Hyatt wrote:
>
>>On November 27, 2003 at 05:29:43, Gerd Isenberg wrote:
>>
>>>On November 26, 2003 at 21:18:05, Robert Hyatt wrote:
>>>
>>>>I have been converting the X86.s file to work in 64 bit mode on the Opteron
>>>>system I am playing with.  And I must say that after studying the Opteron
>>>>64 bit instruction set, I'm impressed.
>>>>
>>>>First, all the old opcodes work..   mov, sub, bsf, etc..
>>>>
>>>>Second, the familiar 8 32-bit regs are still there.  But they can be
>>>>named %rax rather than %eax to stretch them to 64 bits.  Cute.  And
>>>>then there are 8 more registers you can use with the same old opcodes
>>>>and addressing modes.
>>>>
>>>>In short, it's well-thought-out and very easy to use.  I'll post some
>>>>performance later.  I have PopCnt(), FirstOne() and LastOne() working
>>>>fine.  After I finish the others, I'll see how much (if any) it speeds
>>>>things up.
>>>
>>>Hi Bob,
>>>
>>>Very curious about your 64-bit results...
>>
>>First results are not good.  FirstOne() and LastOne() in asm are actually
>>slower than the normal table-lookup in the C source.  I would suspect it has
>>to do with (a) bigger cache; (b) bsf/bsr are not fast; (c) the C can be
>>inlined while the asm is coded as external procedures...
>>
>>>
>>>I see good chances for Matt Taylor's de Bruijn multiplication to become faster
>>>than a single bsf on AMD64, because bsf reg64 is still 9-cycle vector path
>>>instruction, but 32*32=64bit or 64*64=128bit became direct path intructions on
>>>AMD64 (3/5 cycles). May be you can try it some day...
>>
>>If you have anything that runs under linux, I can run it on this box easily.
>>
>>
>>
>>
>>>
>>>The additional 8-gp registers r8-r15 (64-bit) or r8D-r15D (32-bit) may be even
>>>used as addressing registers (with REX prefix).
>>>
>>>Be aware of the "signed extension" penalty if using signed 32-bit variables as
>>>array indicies, "long" is still 32-bit with msc but 64-bit in gcc:
>>
>>I know.  I had to experiment a bit.  Pointers = 64 bits, ints=32 bits,
>>longs=64 bits, using the gcc 3.2 compiler distributed on this Suse linux
>>distribution AMD is running on the machine I am playing with.
>>
>>
>>
>>
>>>
>>>Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™
>>>
>>>2.22 Using Unsigned Integers for 32-Bit Array Indices
>>>
>>>Optimization
>>>When using a 32-bit variable as an array index, declare the variable as an
>>>unsigned integer instead of a signed integer.
>>>
>>>Application
>>>This optimization applies to 64-bit software.
>>>
>>>Rationale
>>>When performing 64-bit address arithmetic, the compiler must insert an
>>>additional instruction into the object code to sign-extend a signed 32-bit
>>>integer, which reduces performance; no additional instruction is necessary to
>>>zero-extend an unsigned 32-bit integer because the processor performs
>>>zero-extension automatically. (There is no performance penalty for using 64-bit
>>>variables—either signed or unsigned—as array indices.)
>>>
>>>Cheers,
>>>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.