Author: Gerd Isenberg
Date: 23:22:18 12/06/05
Go up one level in this thread
On December 06, 2005 at 15:43:00, Gerd Isenberg wrote: >On December 06, 2005 at 13:13:26, Zappa wrote: > >>On December 06, 2005 at 03:31:29, Gerd Isenberg wrote: >> >>>On December 05, 2005 at 23:24:52, Zappa wrote: >>> >>>>I am getting really, really tired of coding all my evaluation twice (once for >>>>white and once for black). However, one of the things that is keeping me from >>>>switching to a for(i < 2) loop is that I can't do a shift! >>>> >>>>For example, if I have some pattern based on (pawns << 8) for white, than that >>>>is (pawns >> 8) for black, and you can't do a negative shift in IA32. >>>> >>>>My ideas: >>>> >>>>Eugene will happily point out that on the Itanium doing two shifts and selecting >>>>the correct value is 1 (2?) bundles. >>>> >>>>Otherwise on AMD64 I could do >>>> >>>>a) two shifts & cmov. I think 5 instructions (as compared to 1, and I have a >>>>LOT of shifts). >>>> >>>>b) << followed by >>. 1 extra instruction but I have twice as many loads for >>>>constants. >>>> >>>>c) rotate (X | 64-x) (but then I have the possibility of things ending up >>>>rotating around). >>>> >>>>d) your name here . . . :) >>>> >>>>I am not that concerned about latency because there would usually be alot of >>>>stuff around that could be rescheduled, but if I have to do 5 instructions for >>>>every shift my code size will triple. >>>> >>>>anthony >>> >>> >>> >>>a) mixture of a and b >>> >>>// assuming color ::= {0,1} := {white, black} >>>shiftCountWhite = shiftCount & (color-1); >>>shiftCountBlack = shiftCount & -color; // shiftCountWhite ^ shiftCount >>>x <<= shiftCountWhite; >>>x >>= shiftCountBlack; >>> >>>d) conditional generalized shift. >>> >>>if (color) >>> x >>= shiftCount; >>>else >>> x <<= shiftCount; >>> >>>If the routine is inlined and color is a compile time constant (due to unrolling >>>color loops) the compiler will optimize the none taken branch away - otherwise >>>how likely is a misprediction? >>> >>>Gerd >> >>I think the cmov solution is still better: >> > mov >>srl >>sll >>test >>cmov > >Yes, very good - less dependencies except the flag-dependency for cmov. >But i would still give the C-version a try ;-) > >shiftCount = (color-1) & 8; >x <<= shiftCount; >x >>= shiftCount ^ 8; > >dec cl ; color-1 >and cl, 8 >shl rax, cl >xor cl, 8 >shr rax, cl > Or with power of 2 shiftcounts a leading shift is also fine: x >>= (shiftCount<<3); x <<= (shiftCount<<3) ^ 8; shl ecx, 3 shr rax, cl xor cl, 8 shl rax, cl >You may compare codesize - of course dec,and,xor may also be 32-bit instructions >- depends on compiler and possibly types or some casts. >If you have some other instructions around to break some dependencies.... > >Lance's rotate one looks also very promising. ... specially for pawn-attacks where a wrap-and is needed anyway. > >Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.