Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bitboards !! :)

Author: Robert Hyatt

Date: 15:12:15 07/01/04

Go up one level in this thread


On July 01, 2004 at 16:56:50, Gerd Isenberg wrote:

>On July 01, 2004 at 15:49:21, Robert Hyatt wrote:
>
>>On July 01, 2004 at 15:02:34, Gerd Isenberg wrote:
>>
>>>On July 01, 2004 at 11:13:14, Robert Hyatt wrote:
>>>
>>>>On July 01, 2004 at 02:50:35, Tony Werten wrote:
>>>>
>>>>>Hi all,
>>>>>
>>>>>although I like the principle of bitboards, it really bothers me that I can't
>>>>>seem to find a decent/fast way to evaluate weighted safe squares.
>>>>>
>>>>>Suppose I want to (simple) evaluate a rook, I generate a bitboard with all
>>>>>reachable squares and mask off the squares attacked by lower pieces (that's no
>>>>>problem).
>>>>>
>>>>>(This doesn't exacly generate safe squares, only the ones that aren't attacked
>>>>>at all by opponents pieces are, for the remaining squares one would need a SEE,
>>>>>but that's not the point )
>>>>>
>>>>>Now I can use this bitboard ( say rook on e4 ), mask the rank state, and look in
>>>>>a precomputed table how this rankstate scores on an e rank. No problem.
>>>>>
>>>>>But how to do the files ? If I use the rotated board, I need to have the
>>>>>opponents attackboard in this rotated board as well, wich would be very costly
>>>>>to compute (ie also for the bishops,queens ) and very complicated.
>>>>>
>>>>>Any ideas ? Am I missing something ?
>>>>>
>>>>>BTW, doing a popcount isn't a solution, since it violates the elegance of
>>>>>bitboards ( and is slow ?)
>>>>>
>>>>>Tony
>>>>
>>>>
>>>>On the Cray there is an elegant solution, but not on X86 so far...
>>>>
>>>>You can create a 64-word vector of "weights".  How you compute these is up to
>>>>you.  In Cray Blitz I did this as I did the evaluation, figuring out which
>>>>squares were weak, unimportant, strong, useful, painful for opponent, etc.
>>>>After the normal eval, I had a vector of values, one per square for all squares
>>>>on the board.  Now I computed the "attack bitmap" for a piece, and stuck that in
>>>>the vector mask register.  Now when I sum up the square value vector, it only
>>>>sums the values with a corresponding bit mask of 1, meaning this piece attacks
>>>>that square safely.
>>>
>>>Wow great, a scalar product 64word*64bit.
>>>Was it implemented in hardware or a kind of micro-program?
>>
>>Took a couple of instructions.  "vector mask" selects the words you want, you
>>pipe them into a "reduction" operation successively to collapse N words to 1
>>final sum.  This "chains" so it takes essentially no extra time to do, which is
>>cute.. :)  But no similar facility on non-cray cpus to date...
>>
>>
>>
>>
>>>
>>>>
>>>>I obviously don't do that at present, since X86 has no such direct capability
>>>>and the software approach is expensive...
>>>
>>>Thinking about some oppropriate SSE2-instructions for that scalar product, eg.
>>>64 bytes * 64 bit. Four 128-bit (16Byte) xmmm registers where each byte is
>>>associated with one bit of the other operand.
>>>
>>>One subtask, may be the most expensive, is to expand each bit to one byte, so
>>>that 1 becomes 0xff. From 64-bit word to four times 128-bit words.
>>>
>>
>>
>>I hate corresponding with you.  I end up with a _headache_ every last time you
>>start that stuff.  :)
>
>One way to expand the bit to bytes is with 16-bit word lookups getting 128 bits.
>But i guess there is a smarter, less memory expensive approach with pure
>register calculation. It's simply a "sign extension" from one to eight bits each
>;-)
>
>Than you have two 64 byte vectors, one with the weights, the other with binary
>masks 0 and -1 for each initial bit.
>
>Inside a byte loop:
>
>  for (i=0, sum = 0; i < 64; i++)
>     sum += weight[i] & mask[i]; // either null or weight[i]
>
>With four xmm registers, assume the expanded bitboard in xmm0..3.
>
>  mov  rax, [weight] ; load pointer of the aligned weight vector
>  pand xmm0, xmm ptr [rax + 0]
>  pand xmm1, xmm ptr [rax + 16]
>  pand xmm2, xmm ptr [rax + 32]
>  pand xmm3, xmm ptr [rax + 48]
>
>Then four times psadbw (Packed Sum of Absolute Differences of Bytes
>Into a Word) with zero:
>
>  pxor   xmm4, xmm4 ; zero, may be scheduled a bit earlier
>  psadbw xmm0, xmm4
>  psadbw xmm1, xmm4
>  psadbw xmm2, xmm4
>  psadbw xmm3, xmm4
>
>  paddd  xmm0, xmm1
>  paddd  xmm2, xmm3
>  paddd  xmm2, xmm0 ; two final sums in each 64-bit word
>
>  PUNPCKHQDQ xmm0, xmm2
>  paddd  xmm0, xmm2 ; final sums
>
>  movq   rax, xmm0  ; should be avoided, better pass through memory
>
>msc for AMD64 has appropriate intrinsics.
>
>I am curious how fast that will be after WCCC on my new AMD64 Shuttle box.
>That works still (with xmm0..xmm7) in 32-bit mode with my current 32-bit
>development tools.


I am curious why your head doesn't explode.

:)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.