Author: Gerd Isenberg
Date: 12:43:00 12/06/05
Go up one level in this thread
On December 06, 2005 at 13:13:26, Zappa wrote: >On December 06, 2005 at 03:31:29, Gerd Isenberg wrote: > >>On December 05, 2005 at 23:24:52, Zappa wrote: >> >>>I am getting really, really tired of coding all my evaluation twice (once for >>>white and once for black). However, one of the things that is keeping me from >>>switching to a for(i < 2) loop is that I can't do a shift! >>> >>>For example, if I have some pattern based on (pawns << 8) for white, than that >>>is (pawns >> 8) for black, and you can't do a negative shift in IA32. >>> >>>My ideas: >>> >>>Eugene will happily point out that on the Itanium doing two shifts and selecting >>>the correct value is 1 (2?) bundles. >>> >>>Otherwise on AMD64 I could do >>> >>>a) two shifts & cmov. I think 5 instructions (as compared to 1, and I have a >>>LOT of shifts). >>> >>>b) << followed by >>. 1 extra instruction but I have twice as many loads for >>>constants. >>> >>>c) rotate (X | 64-x) (but then I have the possibility of things ending up >>>rotating around). >>> >>>d) your name here . . . :) >>> >>>I am not that concerned about latency because there would usually be alot of >>>stuff around that could be rescheduled, but if I have to do 5 instructions for >>>every shift my code size will triple. >>> >>>anthony >> >> >> >>a) mixture of a and b >> >>// assuming color ::= {0,1} := {white, black} >>shiftCountWhite = shiftCount & (color-1); >>shiftCountBlack = shiftCount & -color; // shiftCountWhite ^ shiftCount >>x <<= shiftCountWhite; >>x >>= shiftCountBlack; >> >>d) conditional generalized shift. >> >>if (color) >> x >>= shiftCount; >>else >> x <<= shiftCount; >> >>If the routine is inlined and color is a compile time constant (due to unrolling >>color loops) the compiler will optimize the none taken branch away - otherwise >>how likely is a misprediction? >> >>Gerd > >I think the cmov solution is still better: > mov >srl >sll >test >cmov Yes, very good - less dependencies except the flag-dependency for cmov. But i would still give the C-version a try ;-) shiftCount = (color-1) & 8; x <<= shiftCount; x >>= shiftCount ^ 8; dec cl ; color-1 and cl, 8 shl rax, cl xor cl, 8 shr rax, cl You may compare codesize - of course dec,and,xor may also be 32-bit instructions - depends on compiler and possibly types or some casts. If you have some other instructions around to break some dependencies.... Lance's rotate one looks also very promising. Gerd > >vs > >test >jmp >srl >jmp >sll > >Of course all of these will be really slow on 32-bits anyway :( > >anthony
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.