Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: High performance move generation

Author: Dan Newman

Date: 00:47:16 11/16/99

Go up one level in this thread


On November 16, 1999 at 00:26:17, Scott Gasch wrote:

>Hi,
>
>I have been rewriting my engine from the ground up and trying to "do it right"
>this second go 'round.  For instance, I pruned the move data struct from a huge
>56 byte struct down to an int to increase performance ;)
>

In my latest code I'm using C++ classes for move type so that I can play around
with the internal structure of the move without having to (in theory) make
massive changes to the code...  I currently use (and have nearly always used)
an int for the move, but I'd like to try out a struct with separate from, to,
etc. to see what that does to performance.

>I have recently been doing Vincent Diepeveen's test of move generation speed to
>clock how fast my generator is.  In the prior version it generated at about 3
>million nodes/sec whereas in this version (with the much smaller move
>representation) it is up to about 6 million nodes/sec on my AMD K6-3 450.
>
>Vincent's test is to generate all successors of:
>rnbqkbnr/ppp2ppp/8/3pp3/3PP3/8/PPP2PPP/RNBQKBNR b KQkq e6 0 0
>
>in a loop, 2000000 times and keep track of how long it takes... then compute
>nodes/sec.  Vincent gets 15 million in the same position with a slower
>processor!
>

I thought Vincent was doing this on a PII-450.  Is than a lot slower than
the K6-3 450?

>I have profiled my code and determined that the bottleneck is in my generation
>routines (as opposed to AddMove)  So my question is this: I currently generate
>moves for a specific piece something like this:
>
>    //
>    // Figure out how the rook can move.
>    //
>    for (iDir = 0; iDir < 4; iDir++)
>    {
>        iTarg = x + iDelta[iDir];
>        while (ONBOARD(iTarg))
>        {
>            iTargContents = PIECE_AT(psPos, iTarg);
>
>            //
>            // The rook can move if the square is empty.
>            //
>            if (IS_EMPTY(iTargContents))
>            {
>                AddMove(psPos, x, iTarg);
>            }
>
>            //
>            // Or if there is an enemy piece there to capture.
>            //
>            else
>            {
>                if (COLOR(iTargContents) != COLOR(iRook))
>                {
>                    AddMove(psPos, x, iTarg);
>                }
>                break;
>            }
>            iTarg += iDelta[iDir];
>        }
>    }
>
>However I see in GNUChess they use some kind of precomputed table.  I am not
>sure I understand exactly how this works.  Also, does anyone else have any other
>tricks to speed up move generation?  I know that in the long run this really
>doesn't make TOO much of a difference but I want my new version to be lean and
>mean...
>
>As always, I appreciate the help.
>Scott

I'm getting 8.1 million pseudo-legal moves generated per second on
Vincent's position on a P6/200 with my latest move generator--a
non-rotated bitboard move generator.  This move generator does scan
the board for slider move generation, so I guess it's really a sort
of hybrid.  (My previous (0x88) move generator goes at about 5.8 Mm/s.)
I suppose that I might get 18 Mm/s on a 450 MHz processor, but I
haven't tried yet.  The difference in cache could make a difference, so
multiplying by 2.25 probably isn't quite right...

Mostly what I do (to get higher performance) is to unroll loops, do
as little if-testing as I can get away with, and (attempt to) reduce
the number of memory accesses.  Many of these things tend to trade code
compactness for speed, so there is a point at which the return can be
negative.

In my 0x88 move generator I use piece lists so that I don't have to
scan the board for the STM's pieces and a switch statement to
handle move generation for different piece types rather than having
arrays of offset values and so forth indexed by piece type or direction.
This means the move generator isn't very compact, but it avoids doing
array lookups for offset values by directly embedding them in the code:

    while( *piece ) {
        switch( piece_type( *piece) ) {
          case PAWN:
            // move gen code for pawn...
            break;
            ...
          case BISHOP: {
              int from = piece_coord( *piece);
              // NE dir
              int to = from + 17;
              while( (to & 0x88) == 0 ) {
                  if( board[to] ) break;
                  *movep++ = from | (to << 7);
                  to += 17
              }
              // NW dir
              // etc...
              break;
          }
          ...
          case KING:
            // move gen code for king
            break;
        }
        piece++;
    }

Another thing I do is have completely separate capture and non-capture
move generators.  This actually hurts on Vincent's test since I have the
overhead of calling two generators and must do many things twice, but in
actual use the capture generator is called far more often than the
non-capture generator.  This is because I use only the capture generator
in the quiescence search, and in the full width search a capture quite
often is the cause of a cutoff preventing the non-capture generator from
getting called...  So, it likely saves some cycles to separate the move
generation like this.  It also makes move ordering easier.

-Dan.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.