Author: Robert Hyatt
Date: 15:12:15 07/01/04
Go up one level in this thread
On July 01, 2004 at 16:56:50, Gerd Isenberg wrote: >On July 01, 2004 at 15:49:21, Robert Hyatt wrote: > >>On July 01, 2004 at 15:02:34, Gerd Isenberg wrote: >> >>>On July 01, 2004 at 11:13:14, Robert Hyatt wrote: >>> >>>>On July 01, 2004 at 02:50:35, Tony Werten wrote: >>>> >>>>>Hi all, >>>>> >>>>>although I like the principle of bitboards, it really bothers me that I can't >>>>>seem to find a decent/fast way to evaluate weighted safe squares. >>>>> >>>>>Suppose I want to (simple) evaluate a rook, I generate a bitboard with all >>>>>reachable squares and mask off the squares attacked by lower pieces (that's no >>>>>problem). >>>>> >>>>>(This doesn't exacly generate safe squares, only the ones that aren't attacked >>>>>at all by opponents pieces are, for the remaining squares one would need a SEE, >>>>>but that's not the point ) >>>>> >>>>>Now I can use this bitboard ( say rook on e4 ), mask the rank state, and look in >>>>>a precomputed table how this rankstate scores on an e rank. No problem. >>>>> >>>>>But how to do the files ? If I use the rotated board, I need to have the >>>>>opponents attackboard in this rotated board as well, wich would be very costly >>>>>to compute (ie also for the bishops,queens ) and very complicated. >>>>> >>>>>Any ideas ? Am I missing something ? >>>>> >>>>>BTW, doing a popcount isn't a solution, since it violates the elegance of >>>>>bitboards ( and is slow ?) >>>>> >>>>>Tony >>>> >>>> >>>>On the Cray there is an elegant solution, but not on X86 so far... >>>> >>>>You can create a 64-word vector of "weights". How you compute these is up to >>>>you. In Cray Blitz I did this as I did the evaluation, figuring out which >>>>squares were weak, unimportant, strong, useful, painful for opponent, etc. >>>>After the normal eval, I had a vector of values, one per square for all squares >>>>on the board. Now I computed the "attack bitmap" for a piece, and stuck that in >>>>the vector mask register. Now when I sum up the square value vector, it only >>>>sums the values with a corresponding bit mask of 1, meaning this piece attacks >>>>that square safely. >>> >>>Wow great, a scalar product 64word*64bit. >>>Was it implemented in hardware or a kind of micro-program? >> >>Took a couple of instructions. "vector mask" selects the words you want, you >>pipe them into a "reduction" operation successively to collapse N words to 1 >>final sum. This "chains" so it takes essentially no extra time to do, which is >>cute.. :) But no similar facility on non-cray cpus to date... >> >> >> >> >>> >>>> >>>>I obviously don't do that at present, since X86 has no such direct capability >>>>and the software approach is expensive... >>> >>>Thinking about some oppropriate SSE2-instructions for that scalar product, eg. >>>64 bytes * 64 bit. Four 128-bit (16Byte) xmmm registers where each byte is >>>associated with one bit of the other operand. >>> >>>One subtask, may be the most expensive, is to expand each bit to one byte, so >>>that 1 becomes 0xff. From 64-bit word to four times 128-bit words. >>> >> >> >>I hate corresponding with you. I end up with a _headache_ every last time you >>start that stuff. :) > >One way to expand the bit to bytes is with 16-bit word lookups getting 128 bits. >But i guess there is a smarter, less memory expensive approach with pure >register calculation. It's simply a "sign extension" from one to eight bits each >;-) > >Than you have two 64 byte vectors, one with the weights, the other with binary >masks 0 and -1 for each initial bit. > >Inside a byte loop: > > for (i=0, sum = 0; i < 64; i++) > sum += weight[i] & mask[i]; // either null or weight[i] > >With four xmm registers, assume the expanded bitboard in xmm0..3. > > mov rax, [weight] ; load pointer of the aligned weight vector > pand xmm0, xmm ptr [rax + 0] > pand xmm1, xmm ptr [rax + 16] > pand xmm2, xmm ptr [rax + 32] > pand xmm3, xmm ptr [rax + 48] > >Then four times psadbw (Packed Sum of Absolute Differences of Bytes >Into a Word) with zero: > > pxor xmm4, xmm4 ; zero, may be scheduled a bit earlier > psadbw xmm0, xmm4 > psadbw xmm1, xmm4 > psadbw xmm2, xmm4 > psadbw xmm3, xmm4 > > paddd xmm0, xmm1 > paddd xmm2, xmm3 > paddd xmm2, xmm0 ; two final sums in each 64-bit word > > PUNPCKHQDQ xmm0, xmm2 > paddd xmm0, xmm2 ; final sums > > movq rax, xmm0 ; should be avoided, better pass through memory > >msc for AMD64 has appropriate intrinsics. > >I am curious how fast that will be after WCCC on my new AMD64 Shuttle box. >That works still (with xmm0..xmm7) in 32-bit mode with my current 32-bit >development tools. I am curious why your head doesn't explode. :)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.