Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: 0x88 findings

Author: Gerd Isenberg
Date: 07:25:29 02/23/06
On February 22, 2006 at 18:16:04, h.g.muller wrote:

>By the way, this discussion brings to my attention that it is possible to make a
>very fast move generation that is virtually branchless, on the philosophy:
>"branches that are not in your code can't be mis-predicted".
>
>The donside of this philosophy is that every iteration of the move-generator
>loop would always (conditionally) have to do all things that normally the three
>loops of your move generator (over pieces, directions and squares along the ray)
>do: get the next piece, take the new direction step-vector from a table,
>initialize the to-square to the position of the piece, step the to-square, look
>hat is on it. But if you realize that every mispredicted branch costs you 11
>cycles, and that you can do 3 intructions per cycle, this is a very small price
>indeed.
>
>The principal data structure would in addition to a board, which contains piece
>numbers, and a piece list, which contains square numbers, would be a list of
>rays. For every ray of every piece there would be a separate entry in this list,
>i.e. a full complement of pieces would have 8x3 rays for the Pawns, 4x8 rays for
>KQNN and 4x4 rays for RRBB, 72 in total for each side. The single loop of the
>move generator would step through this list, thus servicing all direction of all
>pieces. The ray list would be a linked list, every entry specifying its
>successor, so that captured pieces can be easily removed.
>
>For ray number i, piece[i] would contain the piece (its number) to which the ray
>belongs, vec[i] the step on the board for the direction the ray runs in, and
>mask[i] would contain status bits that tell things like if this is a ray of a
>sliding piece, if captures along this ray are allowed, or if normal moves are
>allowed. All this is initialized once and for all during startup. next[i] would
>contain the link to the next ray, initialied as ray+1, and updated by DO_MOVE /
>UNDO_MOVE on captures to remove the rays of the no-longer-existing piece.
>
>The piece numbers would be chosen such that the color of the piece can be easily
>deduced from the number, like 0x10-0x1f for white, and 0x20-0x2f for black.
>Empty squares on the board are indicated by 0x40, inaccessable guard-band
>squares by 0x80. All variables are single bytes (unsigned char).
>
>The move generation relies heavily on the exitence of conditional move
>istructions CMOV, that allow you to do a conditional assignment 'if(A)B=C;'
>without using a branch instruction. These instructions perform the assignment or
>not, based on the condition, but don't alter the flow of control, and thus don't
>involve branch prediction.
>
>The move generator would look like this:
>
>ray = 0;                       /* first ray */
>new = 1;                       /* is new one */
>do{
>    p = piece[ray];            /* piece to which this ray belongs */
>    from[msp] = pos[p];        /* put move on move stack */
>    if(new) y = pos[p];        /* from-square */
>    step = vec[ray];           /* board step for this ray */
>    y += step;                 /* step to-square along ray */
>    to[msp] = y;               /* put move on move stack */
>    invalid = (side_to_move ^ board[y]) & mask[ray];
>                               /* check if contents of to-square compatible with
>this direction */
>    if(!invalid) msp++;        /* accept move */
>    new = invalid | mask[ray]&NON_SLIDING;
>                               /* end of old ray reached if move was already
>invalid, or non-sliding piece */
>    if(new) ray = next[ray];   /* conditionally go to next ray */
>}while(ray);                   /* until all rays done */
>/* All normal moves now stacked */
>
>The loop needs 5 loads from memory tables (3 ray list 1 board, and 1 piece
>list), 2 stores to the move stack, and 3 condtional moves. If I count right
>there are 10 ALU instructions if we can fit everything in the registers. At the
>end there will be a 'take-always' branch, which is only mispredicted once, at
>the end of the loop. Most current machines can do 1 or 2 memory ops per clock,
>and 2 or 3 ALU ops. So they should be able to run an assembler-optimized version
>of this loop in 5-7 CPU clocks. At 2 GHz that is 3,5 nsec per move, or ~300
>million moves per second. Ray list, piece list and board will aways reside in
>cache, together they occupy about 1KB.

very cute - i thought i was the owner of a branchless movegen patent ;-)
Does gcc or intel c generate cmovs here?

instead of
    if(!invalid) msp++;        /* accept move */
one may also write to force setCC here
    msp += !invalid;
Re: 0x88 findings h.g.muller 09:47:37 02/23/06
- Re: 0x88 findings Gerd Isenberg 10:25:08 02/23/06
  - Re: 0x88 findings h.g.muller 00:41:41 02/24/06
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.