Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: kogge-stone, rotated bitboards, "raytracing" - on 32 or 64-bit platforms

Author: Gerd Isenberg

Date: 09:27:26 09/27/04

Go up one level in this thread


On September 25, 2004 at 06:21:13, martin fierz wrote:

>aloha!
>
>[OT]autumn has hit switzerland, it's cold and raining outside & my girlfriend is
>on a holiday so i have resumed looking at my engine => a zillion questions,
>here's the first[/OT]
>
>when i started programming my chess engine on july 1st 2003, i didn't do much
>research on it at all and started hacking away happily. the result is a "grown",
>undesigned program and i'm wondering about whether i should start some major
>rewrites/cleaning up.
>back then, i chose bitboards as board representation since i figured that if now
>both 0x88 and bitboards are popular, the advent of 64-bit computing would shift
>the balance in the direction of bitboards. back then too, i figured that i
>better get something working instead of worrying too much about the details, and
>didn't bother to understand rotated bitboards, thinking i could always add that
>later. sorry for the lengthy introduction, but i think it may help to understand
>what i want to do/know :-)
>
>today, my engine is ultra-slow in terms of kN/s (about like gothmog, which also
>goes to show that my thing could be much better despite being so slow). one of
>the reasons for the slowness is that in my eval (which i do everywhere in the
>tree) i compute all attacks since this comes in handy for stuff like evaluation
>and SEE. according to my profiler, this attack computation takes most time.
>
>now, here's how i do attacks today, i call it "raytracing":
>
>int64 empty, org, attack, tmp, y;
>
>empty = ~p->occupied;
>tmp = p->wrook;
>
>while (tmp)
>  {
>  org = tmp & -tmp; // starting square of the slider
>  tmp ^= org;       // remove this slider
>  attack = 0;
>
>  x = forward(org); // x traces a ray forward; forward(x) is a macro
>                    // forward(x) <--> (x = x<<8);
>  while(x & empty)
>    {
>    attack |= x;
>    x = forward(x);
>    }
>
>  // attack now contains all attacks in the forward direction of this slider.
>  // the rest of the rook attacks would be similar
>
>  return(attack);
>  }
>
>then there are two other appoaches, the rotated bitboard approach which needs a
>table lookup, and the kogge-stone fills which i believe originate from steffan
>westcott and which gerd propagates here from time to time.
>
>here are my first 3 questions
>
>1) does anybody here have an idea how these 3 ways of generating attacks for
>sliders compare speedwise for both 32-bit and 64-bit processors? (reason: if
>kogge-stone will be comparable to rotated bitboards, i'd rather not bother to
>implement rotated bitboards, and if both are not much better than my current
>code, i'd also rather not bother with a rewrite).
>
>2) in the code above, to make things more readable and compact i could write
>functions like this (same for kogge-stone):
>
>forwardattack(org, empty)
>  {
>  int64 attack;
>  x = forward(org); // x traces a ray forward; forward(x) is a macro
>                    // forward(x) <--> (x = x<<8);
>  while(x & empty)
>    {
>    attack |= x;
>    x = forward(x);
>    }
>  return attack;
>  }
>
>and then put together my rook attack function as
>
>for-all-rooks-do...
>  attack =
>forwardattack(rooksquare)|backwardattack(rooksquare)|leftattack(rooksquare)|rightattack(rooksquare);
>
>this would be much much cleaner instead of writing out the code all the time for
>all attackers.
>
>question: would that be slower than writing it out? will the compiler inline
>such functions anyway?
>
>3) the ray-tracing method is probably slower, but one of the benefits is that
>you can continue ray-tracing once you hit an occupied square, so you can get
>pins or x-ray-attacks. i don't see how to get such attacks easily in rotated
>bitboard or kogge-stone formalism. am i missing something?
>
>cheers
>  martin
>
>


Hi Martin,

your raytrace algorithm is similar to my unrolled dumb7fill:

unsigned int64 empty, attack, y;
// unsigned is important for right shifts!
// otherwise signed right shift may produce wrong results,
//  e.g. for shifting down a rook on h8 (== 63).

 empty = ~occupied;
 // 1. fill
 y       = wrooks << 8;		//  1
 attack  = y;
 // 2. fill
 y       = (y & empty)<<8;	//  2
 attack |= y;			//  1
 // 3. fill
 y       = (y & empty)<<8;	//  2
 attack |= tmp;			//  1
 // 4...6. fill
 ....
 // 7. fill
 y       = (y & empty)<<8;	//  2
 attack |= y;			//  1
				// 19 shifts/and/or

As you can see, the difference is that dumb7fill is branchless, but may generate
all foreward attacks of all rooks (and even all queens) of one color
simultaniously.

The advantage with this branchless code is, if enough registers are available,
several directions may be processed in parallel.
Even with a single piece, trying to safe some cycles here by looking
if y == zero ==> break, is absolutely contra productive. Such fill stuff is
intended to keep all cpu-resources busy - without any miss-predicted branch.

Ok, if you intend to raytrace pinned pieces and xrays in the same run, you may
try it, but i bet on AMD64, using two Kogge-Stones as Russell suggested is
faster to detect xrays.

For pinned pieces or remove checker, i suggest to get rook/bishop attacks from
the king square as well and to intersect sliders with opposite king direction.

Steffan Westcott's Kogge-Stone algorithm is more or less a parallel prefix
version of dumb7fill, doing n-1 dumb fill-steps in log2(n) steps, using the
pieces as a so called generator (g) and the empty set as propagator (p):

 g |= p & (g << 8);	//  3
 p &= (p << 8);		//  2
 g |= p & (g << 16);	//  3
 p &= (p << 16);	//  2
 g |= p & (g << 32);	//  3
 g  = g << 8;		//  1
                        // 14 shift/and/or instructions

If we compare the number of shift,and,or-instructions of dumb7fill with
Kogge-Stone, it is in favour for Kogge-Stone 14 versus 19.
But Kogge-Stone needs more move-instructions and more registers.
So the advantage is not that big, specially considering wrapped directions.

On x86-32 there are not enough register-pairs, to do Kogge-Stone efficiently.
Most of the time temporary results must be saved/reloaded on/from stack. Even
with eight 64-bit mmx-registers, i use a mixture of dumb7fill and KoggeStone in
my current IsiChess MMX.

--------

While fill-stuff favours working in a pure bitboard-centric world, rotated
bitboards favours a mixed approach, bitboard-centric and move-centric, with
bitscanned square indices.

What IMHO is important if using fill-stuff is to keep all disjoint attacks sets.
They are usefull for a lot of things.

Cheers,
Gerd





This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.