Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Why Rotated Indices are better for diagonals

Author: Robert Hyatt

Date: 08:09:19 01/18/04

Go up one level in this thread


On January 17, 2004 at 06:09:17, Gerd Isenberg wrote:

>On January 16, 2004 at 15:51:42, Gerd Isenberg wrote:
>
>>On January 16, 2004 at 14:26:32, Alessandro Damiani wrote:
>>
>>>
>>>>i was asking because i'm generating all attacks in all positions (also the leaf
>>>>nodes), which takes a significant amount of time. i thought my
>>>>looping-over-bitboards was terribly slow, then i replaced it by the kogge-stone
>>>>stuff that was posted here a while ago, and it got even slower ;-)
>>>>so the next thing to try would be rotated bitboards, but it's rather a lot of
>>>>work to implement, i'm afraid.
>>>
>>>When I started working on my engine years back I first implemented the Chess 4.0
>>>style attack tables. Then I read about Bob's Rotated Bitboards. I liked the
>>>idea, but found keeping updated rotated representations of the board too
>>>complicated. I therefore developed my own way of determining the attacked
>>>squares. I call it Rotated Indices. As its name suggests it keeps updated the
>>>indices used to access the attack tables. In contrast to Rotated Bitboards you
>>>don't have to extract the index out of the rotated bitboards, it is already
>>>there. The code is very simple.
>>>
>>>If you really want to implement Rotated Bitboards then you have to do it,
>>>waiting doesn't help. ;-) But, if you don't want to spend more than two hours
>>>for the implementation then you may have a look at Rotated Indices. You can find
>>>the source code on my homepage under "Chess programming: Bitboards in Fortress":
>>>http://freeweb.econophone.ch/fortress
>>>
>>>Alessandro
>>>        ....eine Einbuchtung im Raum
>>
>>After looking a while to your source, i think your code is very similar to mine,
>>except the data density of your Indices and the identifiers ;-)
>>
>>// slide indices:
>>unsigned int SlideIndexH[8],     // horizontal
>>             SlideIndexV[8],     // vertical
>>             SlideIndexRU[16],   // right up (RU)
>>             SlideIndexRD[16];   // right down (RD)
>>
>>I use(d) something like this:
>>
>>// slide indices == rotated bitboards:
>>unsigned char SlideIndexH[8],     // horizontal
>>              SlideIndexV[8],     // vertical
>>              SlideIndexRU[8],   // right up (RU)
>>              SlideIndexRD[8];   // right down (RD)
>>
>>But with bitboard declaration instead of BYTE[8] (what about a union?).
>>The only trick is to pack the 15 diagonals byte aligned into eight bytes and
>>address them in this manner.
>>
>>rightupIdx   = (sq-Rank(sq)) & 7;
>>rightDownIdx = (~sq-Rank(sq)) & 7;
>>
>>My drawback was the byte access with zero extending to 32-bit.
>>Your drawback is to update four SlideIndices additionally to the none rotated
>>one and i had only three additional ones.
>>
>>Gerd
>
>In my former approach as well in Bob's, both with packed diagonals in 45 or 135
>degree rotated occupied boards, there are sideeffects from one diagonal to
>another which may yield to worse cache efficiency.
>
>1 2 3 4 5 6 7 0
>2 3 4 5 6 7 0 1
>3 4 5 6 7 0 1 2
>4 5 6 7 0 1 2 3
>5 6 7 0 1 2 3 4
>6 7 0 1 2 3 4 5
>7 0 1 2 3 4 5 6
>0 1 2 3 4 5 6 7
>^a1
>
>E.g with my mapping the rightUp access with square a8 (ray a8-a8), is dependent
>on the state of all squares of diagonals with index 1, that is a8-a8 and b1-h7.
>
>Therefore each state change on b1-h7 does (not necessarily) effect the address
>of
>
>a1h8Attacks[a8][state of diagonal 1]
>
>... even if their content is the same of course, an empty attack set for this
>particular square and diagonal.
>
>Due to such state side effects of "complement" or neighboured diagonals, much
>more elements of the huge attack array are addressed than necessary, which may
>yield to more cache pollution.
>
>Your unique diagonal index scheme 0..14 don't has this problem. In Bob's
>approach for diagonals, it may be worth to introduce a second lookup to consider
>the length of the diagonals by not anding with 63 but with
>diaMaskRU[sq] containing values 1,3,7,15,31,63.

I did that when I first wrote this code, but after thinking a while, I tried
the current approach and it was slighly faster, but not by much, probably
because it eliminated the extra memory reference to get the diag length...  I'll
test it again to see if this is still true however.  It might be different with
the smaller tables I now use, as well.

One thought to not forget is that what you are doing is an attempt to
reduce cache usage, by not referencing parts of the bishop attacks that
can be avoided by using the precise diag contents.  But the L2 is still
going to suck in 64-128 bytes depending on whether you use AMD or Intel,
and since the useful attack information is now "gappy" it isn't clear that
this is really going to save anything.  IE if you use the actual diagonal
contents, rather than the 6-bit "super-diagonal" as I currently do, then
you access 1, and skip 63, then access 2 and skip 62, access 4 and skip 60,
etc.  But big chunks of that which is skipped still ends up in L2...

>
>One possible shlight improvement in your approach is a lookup in precalculated
>tables to save some calculations for diagonals, even if they are cheap:
>
>(s>>3)+(s&7)
>7-(s>>3)+(s&7)
>
>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.