Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Resources about rotated bitboards

Author: Gerd Isenberg

Date: 06:04:38 01/16/04

Go up one level in this thread


On January 16, 2004 at 00:29:28, Russell Reagan wrote:

>On January 15, 2004 at 11:01:46, Robert Hyatt wrote:
>
>>I am not sure that my current
>>implementation is "optimal" either.  So even if I did a comparison, I'm not
>>sure it would be valid.
>
>Tim Foden showed an interesting method that was very cache friendly. Something
>like this.
>
>static BYTE s_patMoves[8][256]; // 2.00K [R/F][pattern]
>static UINT64 s_extendVert[256]; // 2.00K [pattern]
>static UINT64 s_extendHorz[256]; // 2.00K [pattern]
>
>static const UINT64 s_rankMask[8]; // 0.06K [file]
>static const UINT64 s_fileMask[8]; // 0.06K [rank]
>static const UINT64 s_diagAMask[15]; // 0.12K [diagA]
>static const UINT64 s_diagHMask[15]; // 0.12K [diagH]
>
>inline int Rank( Sq sq ) { return sq >> 3; }
>inline int File( Sq sq ) { return sq & 0x7; }
>
>inline BYTE PatMoves( int rf, int pat )
>{ return CStatics::s_patMoves[rf][pat]; }
>inline UINT64 ExtVert( int pat ) { return CStatics::s_extendVert[pat]; }
>inline UINT64 ExtHorz( int pat ) { return CStatics::s_extendHorz[pat]; }
>
>inline UINT64 RankMask( int r ) { return CStatics::s_rankMask[r]; }
>inline UINT64 FileMask( int f ) { return CStatics::s_fileMask[f]; }
>
>inline UINT64 CBoard::GenRAttacksBitmap( Sq sq ) const
>{
>    int r = Rank(sq);
>    int f = File(sq);
>    BYTE movesR = PatMoves(f, m_rot[cHorz].bytes[r]);
>    BYTE movesF = PatMoves(r, m_rot[cVert].bytes[f]);
>    UINT64 bitmap = (ExtVert(movesR) & RankMask(r)) |
>        (ExtHorz(movesF) & FileMask(f));
>    return bitmap;
>}
>
>inline int DiagA( int rank, int file ) { return 7 + rank - file; }
>inline int DiagA( Sq sq ) { return DiagA(Rank(sq), File(sq)); }
>inline int DiagH( int rank, int file ) { return rank + file; }
>inline int DiagH( Sq sq ) { return DiagH(Rank(sq), File(sq)); }
>
>inline UINT64 DiagAMask( int a ) { return CStatics::s_diagAMask[a]; }
>inline UINT64 DiagHMask( int h ) { return CStatics::s_diagHMask[h]; }
>
>inline UINT64 CBoard::GenBAttacksBitmap( Sq sq ) const
>{
>    int r = Rank(sq);
>    int f = File(sq);
>    int a = DiagA(r, f);
>    int h = DiagH(r, f);
>    BYTE movesA = PatMoves(f, m_rot[cDiagA].bytes[a & 7]);
>    BYTE movesH = PatMoves(f, m_rot[cDiagH].bytes[h & 7]);
>    UINT64 bitmap = (ExtVert(movesA) & DiagAMask(a)) |
>        (ExtVert(movesH) & DiagHMask(h));
>    return bitmap;
>}
>
>I'm not sure if this is necessarily faster (especially if you have a big cache),
>but it seems like a novel approach. Maybe it will give you some new ideas, if
>you hadn't seen it already.

pretty clever code...

But i guess Bob's lookup code is fastest for AMD64 with 1MB 2nd level cache.
That is really the shortest code, only one shift and "and" per ray, and the
"reduced" lookup tables are most likely in 2nd level cache.


>
>Then there are the reversed bitboards which take advantage of the x^(x-2) trick
>that Gerd uses. The only bad thing about those is that you end up having a
>reversed bitboard for each direction (as opposed to three rotated bitboards), so
>you have significantly more to update.

I don't use any reversed bitboards. Currently i use

     occupied^(occupied-2*rooks)

only for the "positive" rank-direction, using MMX bytewise (rankwise) psubb
(paddb) instruction. That will even work for multiple rooks/queens:

01010110 occupied
00010010 rooks, subset of occupied

01000100 occupied - rooks
00110010 occupied - 2*rooks
01100100 occupied^(occupied-2*rooks)

Other "positive" directions require double ray masking for single pieces.
Therefore other directions i do with KoggeStone (up/down) and dumb7fill.
So x^(x-2) is nice to have, but doesn't gain so much.

>One thing that I kind of like about it is
>that you can generate attacks in single directions (as opposed to generating
>rook attacks in all four directions), which might be useful for some things.

That's one aspect where fill stuff gains something, e.g. for x-rays and pinned
pieces. Loopless capture generation with "better" predictability whether
opposite pieces are attacked on one ray direction or not...

Of couse by doubling the size of lookup-tables that may also be done with
rotated bitboards.

>The
>rotated bitboards are pretty fast, and doing x^(x-2) four or eight times might
>be slower than the lookup tables used by the rotated approach.
>
>And of course, there are the filler routines that Gerd uses. That requires a
>_whole_ new way of thinking about a chess program. It takes a person a while to
>"think bitboards", and then you start from scratch again trying to figure out
>how to get those fillers to work efficiently. I tried it for a while, and...I
>have a new appreciation for Gerd's accomplishments :)

No please, i'm only an obsessed bit frickler ;-)

You really need some bigger (64/128-bit) register files to do it efficiently.
X86-32 with only 8-gp registers is really not the appropriate architecture for
fill algorithms.

>
>Basically my point is, there is probably still nontrivial room for improvement
>with all of these very different approaches to bit twiddling.

It depends a lot on the infrastructure/design principles of the engine.
To avoid BitBoard serialisation and if, to do X&-X instead of BitScan.
Legale move generation, and to keep processed information for eval and movegen
in memory for a while. I partially break the priciple to process things on the
fly when needed, simply to gain more parallelism.

Cheers,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.