Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: ASM/Optimization

Author: Gerd Isenberg

Date: 12:43:00 12/06/05

On December 06, 2005 at 13:13:26, Zappa wrote:

>On December 06, 2005 at 03:31:29, Gerd Isenberg wrote:
>
>>On December 05, 2005 at 23:24:52, Zappa wrote:
>>
>>>I am getting really, really tired of coding all my evaluation twice (once for
>>>white and once for black).  However, one of the things that is keeping me from
>>>switching to a for(i < 2) loop is that I can't do a shift!
>>>
>>>For example, if I have some pattern based on (pawns << 8) for white, than that
>>>is (pawns >> 8) for black, and you can't do a negative shift in IA32.
>>>
>>>My ideas:
>>>
>>>Eugene will happily point out that on the Itanium doing two shifts and selecting
>>>the correct value is 1 (2?) bundles.
>>>
>>>Otherwise on AMD64 I could do
>>>
>>>a) two shifts & cmov.  I think 5 instructions (as compared to 1, and I have a
>>>LOT of shifts).
>>>
>>>b) << followed by >>.  1 extra instruction but I have twice as many loads for
>>>constants.
>>>
>>>c) rotate (X | 64-x) (but then I have the possibility of things ending up
>>>rotating around).
>>>
>>>d) your name here . . . :)
>>>
>>>I am not that concerned about latency because there would usually be alot of
>>>stuff around that could be rescheduled, but if I have to do 5 instructions for
>>>every shift my code size will triple.
>>>
>>>anthony
>>
>>
>>
>>a) mixture of a and b
>>
>>// assuming color ::= {0,1} := {white, black}
>>shiftCountWhite = shiftCount & (color-1);
>>shiftCountBlack = shiftCount & -color; // shiftCountWhite ^ shiftCount
>>x <<= shiftCountWhite;
>>x >>= shiftCountBlack;
>>
>>d) conditional generalized shift.
>>
>>if (color)
>> x >>= shiftCount;
>>else
>> x <<= shiftCount;
>>
>>If the routine is inlined and color is a compile time constant (due to unrolling
>>color loops) the compiler will optimize the none taken branch away - otherwise
>>how likely is a misprediction?
>>
>>Gerd
>
>I think the cmov solution is still better:
>
 mov
>srl
>sll
>test
>cmov

Yes, very good - less dependencies except the flag-dependency for cmov.
But i would still give the C-version a try ;-)

shiftCount = (color-1) & 8;
x <<= shiftCount;
x >>= shiftCount ^ 8;

dec  cl    ; color-1
and  cl, 8
shl  rax, cl
xor  cl, 8
shr  rax, cl

You may compare codesize - of course dec,and,xor may also be 32-bit instructions
- depends on compiler and possibly types or some casts.
If you have some other instructions around to break some dependencies....

Lance's rotate one looks also very promising.

Gerd

>
>vs
>
>test
>jmp
>srl
>jmp
>sll
>




>Of course all of these will be really slow on 32-bits anyway :(
>
>anthony

Re: ASM/Optimization Gerd Isenberg 23:22:18 12/06/05

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.