Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bitboards !! :)

Author: Frank E. Oldham

Date: 15:22:37 07/01/04

Go up one level in this thread


On July 01, 2004 at 18:12:15, Robert Hyatt wrote:

>On July 01, 2004 at 16:56:50, Gerd Isenberg wrote:
>
>>On July 01, 2004 at 15:49:21, Robert Hyatt wrote:
>>
>>>On July 01, 2004 at 15:02:34, Gerd Isenberg wrote:
>>>
>>>>On July 01, 2004 at 11:13:14, Robert Hyatt wrote:
>>>>
>>>>>On July 01, 2004 at 02:50:35, Tony Werten wrote:
>>>>>
>>>>>>Hi all,
>>>>>>
>>>>>>although I like the principle of bitboards, it really bothers me that I can't
>>>>>>seem to find a decent/fast way to evaluate weighted safe squares.
>>>>>>
>>>>>>Suppose I want to (simple) evaluate a rook, I generate a bitboard with all
>>>>>>reachable squares and mask off the squares attacked by lower pieces (that's no
>>>>>>problem).
>>>>>>
>>>>>>(This doesn't exacly generate safe squares, only the ones that aren't attacked
>>>>>>at all by opponents pieces are, for the remaining squares one would need a SEE,
>>>>>>but that's not the point )
>>>>>>
>>>>>>Now I can use this bitboard ( say rook on e4 ), mask the rank state, and look in
>>>>>>a precomputed table how this rankstate scores on an e rank. No problem.
>>>>>>
>>>>>>But how to do the files ? If I use the rotated board, I need to have the
>>>>>>opponents attackboard in this rotated board as well, wich would be very costly
>>>>>>to compute (ie also for the bishops,queens ) and very complicated.
>>>>>>
>>>>>>Any ideas ? Am I missing something ?
>>>>>>
>>>>>>BTW, doing a popcount isn't a solution, since it violates the elegance of
>>>>>>bitboards ( and is slow ?)
>>>>>>
>>>>>>Tony
>>>>>
>>>>>
>>>>>On the Cray there is an elegant solution, but not on X86 so far...
>>>>>
>>>>>You can create a 64-word vector of "weights".  How you compute these is up to
>>>>>you.  In Cray Blitz I did this as I did the evaluation, figuring out which
>>>>>squares were weak, unimportant, strong, useful, painful for opponent, etc.
>>>>>After the normal eval, I had a vector of values, one per square for all squares
>>>>>on the board.  Now I computed the "attack bitmap" for a piece, and stuck that in
>>>>>the vector mask register.  Now when I sum up the square value vector, it only
>>>>>sums the values with a corresponding bit mask of 1, meaning this piece attacks
>>>>>that square safely.
>>>>
>>>>Wow great, a scalar product 64word*64bit.
>>>>Was it implemented in hardware or a kind of micro-program?
>>>
>>>Took a couple of instructions.  "vector mask" selects the words you want, you
>>>pipe them into a "reduction" operation successively to collapse N words to 1
>>>final sum.  This "chains" so it takes essentially no extra time to do, which is
>>>cute.. :)  But no similar facility on non-cray cpus to date...
>>>
>>>
>>>
>>>
>>>>
>>>>>
>>>>>I obviously don't do that at present, since X86 has no such direct capability
>>>>>and the software approach is expensive...
>>>>
>>>>Thinking about some oppropriate SSE2-instructions for that scalar product, eg.
>>>>64 bytes * 64 bit. Four 128-bit (16Byte) xmmm registers where each byte is
>>>>associated with one bit of the other operand.
>>>>
>>>>One subtask, may be the most expensive, is to expand each bit to one byte, so
>>>>that 1 becomes 0xff. From 64-bit word to four times 128-bit words.
>>>>
>>>
>>>
>>>I hate corresponding with you.  I end up with a _headache_ every last time you
>>>start that stuff.  :)
>>
>>One way to expand the bit to bytes is with 16-bit word lookups getting 128 bits.
>>But i guess there is a smarter, less memory expensive approach with pure
>>register calculation. It's simply a "sign extension" from one to eight bits each
>>;-)
>>
>>Than you have two 64 byte vectors, one with the weights, the other with binary
>>masks 0 and -1 for each initial bit.
>>
>>Inside a byte loop:
>>
>>  for (i=0, sum = 0; i < 64; i++)
>>     sum += weight[i] & mask[i]; // either null or weight[i]
>>
>>With four xmm registers, assume the expanded bitboard in xmm0..3.
>>
>>  mov  rax, [weight] ; load pointer of the aligned weight vector
>>  pand xmm0, xmm ptr [rax + 0]
>>  pand xmm1, xmm ptr [rax + 16]
>>  pand xmm2, xmm ptr [rax + 32]
>>  pand xmm3, xmm ptr [rax + 48]
>>
>>Then four times psadbw (Packed Sum of Absolute Differences of Bytes
>>Into a Word) with zero:
>>
>>  pxor   xmm4, xmm4 ; zero, may be scheduled a bit earlier
>>  psadbw xmm0, xmm4
>>  psadbw xmm1, xmm4
>>  psadbw xmm2, xmm4
>>  psadbw xmm3, xmm4
>>
>>  paddd  xmm0, xmm1
>>  paddd  xmm2, xmm3
>>  paddd  xmm2, xmm0 ; two final sums in each 64-bit word
>>
>>  PUNPCKHQDQ xmm0, xmm2
>>  paddd  xmm0, xmm2 ; final sums
>>
>>  movq   rax, xmm0  ; should be avoided, better pass through memory
>>
>>msc for AMD64 has appropriate intrinsics.
>>
>>I am curious how fast that will be after WCCC on my new AMD64 Shuttle box.
>>That works still (with xmm0..xmm7) in 32-bit mode with my current 32-bit
>>development tools.
>
>
>I am curious why your head doesn't explode.
>
>:)

I want to see the altivec perm unit code too;
that'll probably only make our eyes cross :-)
Frank



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.