Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: 0x88 findings

Author: h.g.muller

Date: 07:04:24 02/22/06

Forward generation moves on an 0x88 board requires so few instructions that it
is likely to be more important what you do with the move once you have it (how
do you put it in a table, on a move stack or in a linked list, how many bytes of
information do you have to save there in addition to origin and target square,
e.g. castling flags, piece that was captured...) than the generation process
itself. Mispredicted branches will probably be the dominant cost: I measured the
cost of a misprediction on a Celeron M as 11 CPU clocks (and on P4 it is more
like 20...), that is as much as the time it takes to execute 20-30 normal
instructions! The inner loop of move generation only steps a pointer over the
board by a stride hold in another variable, and tests if there is a piece of
your own there. This takes only 7 instructions:

L:  addl  %ead,%eax             // add stepvector %edx to target square %eax
    testl $0x88*4,%eax          // off board?
    jne   OUT
    testl _board(%eax),%ebx     // test piece there for own color %ebx
    je    OUT
    ...
    testl %ecx,%ecx             // range of piece %ecx, 1 = sliding
    jne   L
OUT:

This is 3 branches and 4 ALU instructions, and should execute in 3 CPU clocks as
long as all branch predictions are correct. Totally negligible with the
misprediction time.

So it seems to me the execution time should be totally dominated by the number
of mispredictions. Because of this I am surprised by your result: even if the
switch is done by an indirect jump through a jump table, this is a certain
misprediction. Treating all pieces by the same code while a parameter fetched
from a table determines the piece behavior does not involve any predictions, and
should in general be *much* faster. If you go for speed it would help to also
integrate the pawns into your table-drive part, by having a table that for every
move direction tells the move generator if that move is allowed as a capture or
a normal one. The final branch you will take on the acceptability of the move is
an unpredictable one anyway, because it depends on the contents of the square
you go to. It pays to destill all unpredictability into one branch, and take the
unavoidable hit because of mispredictions only there. E.g. if you use the 0x10
bit in the piece encoding on the board b[] to indicate a black piece and 0x08 to
indicate a white one, and empty squares by 0x20, you can make a table of masks
int m[] for each move direction, that contains 0x28 if the move is only valid as
a capture, 0x18 when it can only be used as a normal move, or 0x08 for both. If
k designates the side to move as 0 for white, 0x18 for black, xor'ing it with
the piece converts its color coding to 0x08 for pieces of the side to move, and
0x10 for the opponent. The validity test for a tentative move to y becomes
'if((k^board[y])&m[direction])break;', and you can combine it with the (also
unpredictable) test for running off the board as 'if(y&0x88 |
(k^board[y])&m[direction])break;'. It gives you slightly more ALU operations (on
which you can reduce further by filling the non-valid part of the board with
pieces of a non-capturable third color indicated by 0x40), but it eliminated
almost all branches except a single unpredictable one.

The in-check test in the initial situation really is not a fair test. This is
the most favorable situation for the backward move generator, where every ray
immediately terminates by running into a piece or into the board edge.
Furthermore, since you test the same position millions of times, the
branch-prediction logic will probably quickly learn the entire branch pattern.

It might be that the starting position is a very unrepresentative test. Most
pieces don't have any moves at all, rays of sliding pieces always terminate
immediately, they always run into pieces of ther own color, never the opponent.
Most branches become completely predictable because of this, while in reality
they will have a totally random pattern. Even with the perft, where you vary the
position on which the move generator works a little bit, does not change enough
in 6 ply that most of this is no longer true. You should try a perft on a deep
middle-game position, with lots of tactics and open spaces.

Re: 0x88 findings h.g.muller 15:16:04 02/22/06
- Re: 0x88 findings Gerd Isenberg 07:25:29 02/23/06
  - Re: 0x88 findings h.g.muller 09:47:37 02/23/06
    - Re: 0x88 findings Gerd Isenberg 10:25:08 02/23/06
      - Re: 0x88 findings h.g.muller 00:41:41 02/24/06
Re: 0x88 findings Reinhard Scharnagl 08:40:11 02/22/06
- Re: 0x88 findings Alessandro Scotti 09:39:19 02/22/06
  - Re: 0x88 findings Reinhard Scharnagl 10:00:22 02/22/06
    - Re: 0x88 findings Alessandro Scotti 12:15:00 02/22/06
      - Re: 0x88 findings Reinhard Scharnagl 12:51:02 02/22/06

This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.