Author: Gerd Isenberg
Date: 07:25:29 02/23/06
Go up one level in this thread
On February 22, 2006 at 18:16:04, h.g.muller wrote:
>By the way, this discussion brings to my attention that it is possible to make a
>very fast move generation that is virtually branchless, on the philosophy:
>"branches that are not in your code can't be mis-predicted".
>
>The donside of this philosophy is that every iteration of the move-generator
>loop would always (conditionally) have to do all things that normally the three
>loops of your move generator (over pieces, directions and squares along the ray)
>do: get the next piece, take the new direction step-vector from a table,
>initialize the to-square to the position of the piece, step the to-square, look
>hat is on it. But if you realize that every mispredicted branch costs you 11
>cycles, and that you can do 3 intructions per cycle, this is a very small price
>indeed.
>
>The principal data structure would in addition to a board, which contains piece
>numbers, and a piece list, which contains square numbers, would be a list of
>rays. For every ray of every piece there would be a separate entry in this list,
>i.e. a full complement of pieces would have 8x3 rays for the Pawns, 4x8 rays for
>KQNN and 4x4 rays for RRBB, 72 in total for each side. The single loop of the
>move generator would step through this list, thus servicing all direction of all
>pieces. The ray list would be a linked list, every entry specifying its
>successor, so that captured pieces can be easily removed.
>
>For ray number i, piece[i] would contain the piece (its number) to which the ray
>belongs, vec[i] the step on the board for the direction the ray runs in, and
>mask[i] would contain status bits that tell things like if this is a ray of a
>sliding piece, if captures along this ray are allowed, or if normal moves are
>allowed. All this is initialized once and for all during startup. next[i] would
>contain the link to the next ray, initialied as ray+1, and updated by DO_MOVE /
>UNDO_MOVE on captures to remove the rays of the no-longer-existing piece.
>
>The piece numbers would be chosen such that the color of the piece can be easily
>deduced from the number, like 0x10-0x1f for white, and 0x20-0x2f for black.
>Empty squares on the board are indicated by 0x40, inaccessable guard-band
>squares by 0x80. All variables are single bytes (unsigned char).
>
>The move generation relies heavily on the exitence of conditional move
>istructions CMOV, that allow you to do a conditional assignment 'if(A)B=C;'
>without using a branch instruction. These instructions perform the assignment or
>not, based on the condition, but don't alter the flow of control, and thus don't
>involve branch prediction.
>
>The move generator would look like this:
>
>ray = 0; /* first ray */
>new = 1; /* is new one */
>do{
> p = piece[ray]; /* piece to which this ray belongs */
> from[msp] = pos[p]; /* put move on move stack */
> if(new) y = pos[p]; /* from-square */
> step = vec[ray]; /* board step for this ray */
> y += step; /* step to-square along ray */
> to[msp] = y; /* put move on move stack */
> invalid = (side_to_move ^ board[y]) & mask[ray];
> /* check if contents of to-square compatible with
>this direction */
> if(!invalid) msp++; /* accept move */
> new = invalid | mask[ray]&NON_SLIDING;
> /* end of old ray reached if move was already
>invalid, or non-sliding piece */
> if(new) ray = next[ray]; /* conditionally go to next ray */
>}while(ray); /* until all rays done */
>/* All normal moves now stacked */
>
>The loop needs 5 loads from memory tables (3 ray list 1 board, and 1 piece
>list), 2 stores to the move stack, and 3 condtional moves. If I count right
>there are 10 ALU instructions if we can fit everything in the registers. At the
>end there will be a 'take-always' branch, which is only mispredicted once, at
>the end of the loop. Most current machines can do 1 or 2 memory ops per clock,
>and 2 or 3 ALU ops. So they should be able to run an assembler-optimized version
>of this loop in 5-7 CPU clocks. At 2 GHz that is 3,5 nsec per move, or ~300
>million moves per second. Ray list, piece list and board will aways reside in
>cache, together they occupy about 1KB.
very cute - i thought i was the owner of a branchless movegen patent ;-)
Does gcc or intel c generate cmovs here?
instead of
if(!invalid) msp++; /* accept move */
one may also write to force setCC here
msp += !invalid;
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.