Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: kogge-stone, rotated bitboards, "raytracing" - on 32 or 64-bit platf

Author: Gerd Isenberg

Date: 16:41:24 10/01/04

Go up one level in this thread


On September 28, 2004 at 17:49:21, martin fierz wrote:

>On September 27, 2004 at 17:27:45, Gerd Isenberg wrote:
>
>>On September 25, 2004 at 06:21:13, martin fierz wrote:
>>
>>>aloha!
>>
>>oloha!
>>
>>a bit more specific to your other questions.
>>
>><snip>
>hi gerd,
>
>thanks for your answers. i have some more on this now: i tested my raytracer vs
>kogge-stone in perft in the following way:
>
>tree(depth)
>  {
>  if depth == 0
>     gen_attacks()
>  else
>     make all moves
>       tree(depth-1)
>  }
>
>i did a tree(4) or tree(5) on 3 different positions: once in the starting
>position, once in a typical midgame position, and once in a wide-open position
>with few pawns and all sliders placed so that they have lots of squares to go.
>
>in the start position, gen_attacks with raytracing was much faster than with
>kogge-stone. in the "normal" midgame position it was still significantly faster.
>in the wide-open position, kogge-stone was slightly faster.


Curious about the same compare on x86-64.

On x86-32 Kogge-Stone is too ineffective and due to the huge local memory
traffic, there is not much parallel speedup.

Therefore rather good predicted branches to break a ray early pays off here,
since you really often skip a lot of memory store and loads.

With all generators and propagators in registers doing several directions or
multpiple disjoint generators simultaniously, that is not longer the case. Then
branches avoid parallel work with our deep pipes and multiple cpu ressources.

>
>i was surprised with this result, as i would have expected kogge-stone to do
>fine. i think the answer must be that very often, you get a break in my
>raytracing right on the first step, while kogge-stone has to fill in all
>directions, and waste lots of time e.g. for finding backward attacks of a rook
>on the first rank which don't exist. perhaps one could exclude slider attacks in
>obviously bad directions from being generated at all to start with, i.e. before
>generating backward attacks of rooks, do an & with rows 2,3,4,5,6,7,8. rooks
>have this tendency to live on the edge, so it might speed up things a bit.

That may work for rooks and other pieces during the opening and early
middlegame. But i don't like to blow up my code in that way ;-)

I count on an "parallel single-thread speedup" of at least four or even better
in the future. And even longer pipes - each conditional branch is contra
productive for that speedup. If you have four or eight conditional branches the
probability at least one branch got misspredicted here and there, is worth to do
it branchless, IMHO.

Yes, Kogge-Stone and even more dumb7fill do often a lot of "stupid" work.
But 4 for 1 compensates that a bit. Depending on the design, one may profit from
the other parallel nature of fill-routines - to work setwise (not to mention
bitscanless).

That becomes even more interesting if you feed up a set of safe target squares
of a sliding piece to get a set of progessive mobility - move targets in two
moves.

I consider quad-bitboards a nice data structure, to profit
a) from the setwise fill feature
b) to get disjoint piece wise (usually)
   as well as direction wise attack sets:

A 4-bit generator code for

empty == 0
rook1
remainingRooks (most often rook2 if any)
knight1
remainingknights (most often knight2 if any)
queen(s) (most often one, if any)
bishops (most often one on per square color)
king

for white and black.

For a compact board representation (without pawns here), the piece codes are
arranged in a vertical manner in four bitboards, one bitboards for each code
bit. And to do eight slide direction fills as well as eight knight direction
fills, single shift and mask per knight direction.

possible code idea:
        empty  wrook1 wknight1 ... black queen
slider  0      1      0        ... 1
code0   0      0      1        ... 1
code1   0      0      0        ... 1
black   0      0      0        ... 1

The effort to get the piece wise (meta) attacks is surprisingly small, by
combining (and,xor) four bitboards into bitboards for several pieces/piece
kinds.

>
>i see that i must think a bit more in terms of kogge-stone or bitboard attacks
>in general :-)
>both russel's suggestion of a double-kogge-stone to find pinned pieces and your
>suggestion of a reversed directed attack by opponent king with intersection are
>nice.
>
>the main question for me is whether i should bother to rewrite my engine to
>rotated BB, or to leave it as it is, or to switch to kogge-stone or dumbfill of
>yours (thanks for sharing!), with the 64-bit future in mind. i'm afraid i still
>don't know what to do :-)
>

I see.

You will profit from 64-bit anyway
- rotated bitboards require less register ressources
  and are therefor already quite efficient on 32-bit architectures
  - in opposite to kogge-stone.
- kogge stone requires less memory but more register ressources.

My current MMX-program with fill-stuff is derived from my former rotated
bitboard program, calling attack getters and even getAttackedBy on the fly when
needed.

That's probably not the best way with kogge stone (piece wise and not direction
wise), but due to legal move generation, a specialiced multiple set fill routine
for pinned piece detection and getting a set of king taboo squares, payed off.

The above mentioned quad-bitboard approach requires some intermediate memory to
keep the results for a while for various attack or fork/double attack target
getters.


>BTW, i also looked at a version of my raytracer which calls those small
>diag-attack() and straight-attack() functions, it was significantly slower than
>the version which writes out everything inline. i had my compiler set to "inline
>functions: any suitable", but it wasn't inlining these small functions, another
>surprise! after __forceinline, it got clearly faster but still slower then the
>written-out version.
>

Dumb compiler ;-)

Anyway central and therefor maintainable code is important too.

Cheers,
Gerd

>cheers
>  martin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.