Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Why Rotated Indices are better for diagonals

Author: Gerd Isenberg

Date: 07:18:39 01/17/04

Go up one level in this thread


On January 17, 2004 at 07:14:14, Alessandro Damiani wrote:

>On January 17, 2004 at 06:09:17, Gerd Isenberg wrote:
>
>>On January 16, 2004 at 15:51:42, Gerd Isenberg wrote:
>>
>>>On January 16, 2004 at 14:26:32, Alessandro Damiani wrote:
>>>
>>>>
>>>>>i was asking because i'm generating all attacks in all positions (also the leaf
>>>>>nodes), which takes a significant amount of time. i thought my
>>>>>looping-over-bitboards was terribly slow, then i replaced it by the kogge-stone
>>>>>stuff that was posted here a while ago, and it got even slower ;-)
>>>>>so the next thing to try would be rotated bitboards, but it's rather a lot of
>>>>>work to implement, i'm afraid.
>>>>
>>>>When I started working on my engine years back I first implemented the Chess 4.0
>>>>style attack tables. Then I read about Bob's Rotated Bitboards. I liked the
>>>>idea, but found keeping updated rotated representations of the board too
>>>>complicated. I therefore developed my own way of determining the attacked
>>>>squares. I call it Rotated Indices. As its name suggests it keeps updated the
>>>>indices used to access the attack tables. In contrast to Rotated Bitboards you
>>>>don't have to extract the index out of the rotated bitboards, it is already
>>>>there. The code is very simple.
>>>>
>>>>If you really want to implement Rotated Bitboards then you have to do it,
>>>>waiting doesn't help. ;-) But, if you don't want to spend more than two hours
>>>>for the implementation then you may have a look at Rotated Indices. You can find
>>>>the source code on my homepage under "Chess programming: Bitboards in Fortress":
>>>>http://freeweb.econophone.ch/fortress
>>>>
>>>>Alessandro
>>>>        ....eine Einbuchtung im Raum
>>>
>>>After looking a while to your source, i think your code is very similar to mine,
>>>except the data density of your Indices and the identifiers ;-)
>>>
>>>// slide indices:
>>>unsigned int SlideIndexH[8],     // horizontal
>>>             SlideIndexV[8],     // vertical
>>>             SlideIndexRU[16],   // right up (RU)
>>>             SlideIndexRD[16];   // right down (RD)
>>>
>>>I use(d) something like this:
>>>
>>>// slide indices == rotated bitboards:
>>>unsigned char SlideIndexH[8],     // horizontal
>>>              SlideIndexV[8],     // vertical
>>>              SlideIndexRU[8],   // right up (RU)
>>>              SlideIndexRD[8];   // right down (RD)
>>>
>>>But with bitboard declaration instead of BYTE[8] (what about a union?).
>>>The only trick is to pack the 15 diagonals byte aligned into eight bytes and
>>>address them in this manner.
>>>
>>>rightupIdx   = (sq-Rank(sq)) & 7;
>>>rightDownIdx = (~sq-Rank(sq)) & 7;
>>>
>>>My drawback was the byte access with zero extending to 32-bit.
>>>Your drawback is to update four SlideIndices additionally to the none rotated
>>>one and i had only three additional ones.
>>>
>>>Gerd
>>
>>In my former approach as well in Bob's, both with packed diagonals in 45 or 135
>>degree rotated occupied boards, there are sideeffects from one diagonal to
>>another which may yield to worse cache efficiency.
>>
>>1 2 3 4 5 6 7 0
>>2 3 4 5 6 7 0 1
>>3 4 5 6 7 0 1 2
>>4 5 6 7 0 1 2 3
>>5 6 7 0 1 2 3 4
>>6 7 0 1 2 3 4 5
>>7 0 1 2 3 4 5 6
>>0 1 2 3 4 5 6 7
>>^a1
>>
>>E.g with my mapping the rightUp access with square a8 (ray a8-a8), is dependent
>>on the state of all squares of diagonals with index 1, that is a8-a8 and b1-h7.
>>
>>Therefore each state change on b1-h7 does (not necessarily) effect the address
>>of
>>
>>a1h8Attacks[a8][state of diagonal 1]
>>
>>... even if their content is the same of course, an empty attack set for this
>>particular square and diagonal.
>>
>>Due to such state side effects of "complement" or neighboured diagonals, much
>>more elements of the huge attack array are addressed than necessary, which may
>>yield to more cache pollution.
>>
>>Your unique diagonal index scheme 0..14 don't has this problem. In Bob's
>>approach for diagonals, it may be worth to introduce a second lookup to consider
>>the length of the diagonals by not anding with 63 but with
>>diaMaskRU[sq] containing values 1,3,7,15,31,63.
>>
>>One possible shlight improvement in your approach is a lookup in precalculated
>>tables to save some calculations for diagonals, even if they are cheap:
>>
>>(s>>3)+(s&7)
>>7-(s>>3)+(s&7)
>
>I thought a table look-up instead of these simple instructions costs more. It
>seems I am still thinking in old architectures.

No, it's negligible and may be not worth to think about.
I'm also a fan of direct register computation.
Anyway, with such small and very often used tables it is likely to get first
level cache hits (but of course 256 byte here and 256 byte there...).

The code is longer with 3 or 4 instructions (and one additional mov) and if your
macros are expanded very often, a lookup may pay off, if the register pressure
is high.

With AMD64 things may change again, due to more registers available and longer
encoding with absolute addresses.

#define AttackDiagRightUp(s) (AFBrightup[s][SlideIndexRU[(s>>3)+(s&7)]])
                                                          ^      ^
                                       Isn't that dangerous with macros?

I suggest further conditional defines or inlines:

#ifdef RAY_INDEX_PER_COMPUTATION
#define RayIndexRU(s) (((s)>>3)+((s)&7))
#else
#define RayIndexRU(s) scRayIndexRU[s]
#endif

#define AttackDiagRightUp(s) (AFBrightup[s][SlideIndexRU[RayIndexRU(s)]])


>
>After finishing the improved version I will put it on my homepage. I plan to do
>this today.
>
>Alessandro



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.