Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Floodfilling attacks

Author: Gerd Isenberg

Date: 06:31:06 10/25/02

Go up one level in this thread


On October 25, 2002 at 07:55:47, Matthias Gemuh wrote:

>
>Hi Experts,
>
>I tried to floodfill attacks with the knowledge+code discussed in this forum.
>
>Is this resulting code wrong ? (seems to work fine).
>Very slow ! Any speed-up ideas ? Gerd Isenberg's asm was not fully
>accepted by my Borland C++ Builder 5. Thanks.
>
>/Matthias.
>
>
>#define SHIFTDOWN(b)      ((b)>>8)
>#define SHIFTUP(b)        ((b)<<8)
>#define SHIFTRIGHT(b)     ((((b)<<1)&0xFEFEFEFEFEFEFEFE))
>#define SHIFTLEFT(b)      ((((b)>>1)&0x7F7F7F7F7F7F7F7F))
>#define SHIFTLEFTUP(b)    ((((b)<<7)&0x7F7F7F7F7F7F7F7F))
>#define SHIFTRIGHTUP(b)   ((((b)<<9)&0xFEFEFEFEFEFEFEFE))
>#define SHIFTLEFTDOWN(b)  ((((b)>>9)&0x7F7F7F7F7F7F7F7F))
>#define SHIFTRIGHTDOWN(b) ((((b)>>7)&0xFEFEFEFEFEFEFEFE))
>
>BITBOARD getBishopAttacks(BITBOARD bishops, BITBOARD freeSquares)
>{
>    BITBOARD attck = 0;
>    BITBOARD rightup   = SHIFTRIGHTUP(bishops);
>    BITBOARD rightdown = SHIFTRIGHTDOWN(bishops);
>    BITBOARD leftup    = SHIFTLEFTUP(bishops);
>    BITBOARD leftdown  = SHIFTLEFTDOWN(bishops);
>    attck |= rightup|rightdown|leftup|leftdown;
>    for (int i=0; i<7; i++) {
>        rightup   = SHIFTRIGHTUP(rightup&freeSquares);
>        rightdown = SHIFTRIGHTDOWN(rightdown&freeSquares);
>        leftup    = SHIFTLEFTUP(leftup&freeSquares);
>        leftdown  = SHIFTLEFTDOWN(leftdown&freeSquares);
>        attck |= rightup|rightdown|leftup|leftdown;
>    }
>    return (attck);
>}
>
>BITBOARD getRookAttacks(BITBOARD rooks, BITBOARD freeSquares)
>{
>    BITBOARD attck = 0;
>    BITBOARD left  = SHIFTLEFT(rooks);
>    BITBOARD right = SHIFTRIGHT(rooks);
>    BITBOARD up    = SHIFTUP(rooks);
>    BITBOARD down  = SHIFTDOWN(rooks);
>    attck |= left|right|up|down;
>    for (int i=0; i<7; i++) {
>        left  = SHIFTLEFT(left&freeSquares);
>        right = SHIFTRIGHT(right&freeSquares);
>        up    = SHIFTUP(up&freeSquares);
>        down  = SHIFTDOWN(down&freeSquares);
>        attck |= left|right|up|down;
>    }
>    return (attck);
>}
>
>
>BITBOARD getQueenAttacks(BITBOARD queens, BITBOARD freeSquares)
>{
>    BITBOARD attck = 0;
>    BITBOARD rightup   = SHIFTRIGHTUP(queens);
>    BITBOARD rightdown = SHIFTRIGHTDOWN(queens);
>    BITBOARD leftup    = SHIFTLEFTUP(queens);
>    BITBOARD leftdown  = SHIFTLEFTDOWN(queens);
>    BITBOARD left  = SHIFTLEFT(queens);
>    BITBOARD right = SHIFTRIGHT(queens);
>    BITBOARD up    = SHIFTUP(queens);
>    BITBOARD down  = SHIFTDOWN(queens);
>    attck |= rightup|rightdown|leftup|leftdown|left|right|up|down;
>    for (int i=0; i<7; i++) {
>        rightup   = SHIFTRIGHTUP(rightup&freeSquares);
>        rightdown = SHIFTRIGHTDOWN(rightdown&freeSquares);
>        leftup    = SHIFTLEFTUP(leftup&freeSquares);
>        leftdown  = SHIFTLEFTDOWN(leftdown&freeSquares);
>        left  = SHIFTLEFT(left&freeSquares);
>        right = SHIFTRIGHT(right&freeSquares);
>        up    = SHIFTUP(up&freeSquares);
>        down  = SHIFTDOWN(down&freeSquares);
>        attck |= rightup|rightdown|leftup|leftdown|left|right|up|down;
>    }
>    return (attck);
>}

Hi Matthias,

On 32bit-hardware dumb7fill and even Kogge-Stone is much too slow, except using
64 bit mmx-registers, which requires inline asm with msc++ including processor
pack or intel c++.

How long does your getQueenAttacks take in a loop with 10**9 runs? My
mmx-routine on Athlon XP 2.1+ takes about 51 seconds (no memory access, no
conditional jumps, up to four mmx-instruction in parallel).

One point is to unroll the loops totally, but the lack is that you have too less
register pairs (maximum is three) to do all inside registers and to gain really
from parallelism.

In C(++) i would prefere the routines posted by Russell Reagan recently, using
Kogge-Stone parallel prefix algorithm, introduced by Steffan Westcott.

http://www.talkchess.com/forums/1/message.html?261036

The only advantage of dumb7fill is to use less registers and therefore the
possibility to do more directions in parallel (with 8 64-bit mmx-registers). But
this advantage will disappear with hammer.

Hammer will have 16  64-bit general purpose registers
                 16 128-bit xmm-registers
                  8  64-bit mmx-registers

So it can store up to 56 bitboards (54 if we excluse rsp and rpb) in register
instead of three!

At least the 16 64-bit gp registers should used by the 64-bit c-compiler. The
multimedia registers may only be used via assembler or some special mmx- or
SSE2-intrinsics.

So i would advise you, if you are not able to use the mmx-registerrs yet, to
wait until hammer and appropriate compilers are availible, to use this
algorithms. It's really necessary to do all in registers without any conditional
jumps.

Regards,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.