Author: Dan Newman
Date: 00:47:16 11/16/99
Go up one level in this thread
On November 16, 1999 at 00:26:17, Scott Gasch wrote: >Hi, > >I have been rewriting my engine from the ground up and trying to "do it right" >this second go 'round. For instance, I pruned the move data struct from a huge >56 byte struct down to an int to increase performance ;) > In my latest code I'm using C++ classes for move type so that I can play around with the internal structure of the move without having to (in theory) make massive changes to the code... I currently use (and have nearly always used) an int for the move, but I'd like to try out a struct with separate from, to, etc. to see what that does to performance. >I have recently been doing Vincent Diepeveen's test of move generation speed to >clock how fast my generator is. In the prior version it generated at about 3 >million nodes/sec whereas in this version (with the much smaller move >representation) it is up to about 6 million nodes/sec on my AMD K6-3 450. > >Vincent's test is to generate all successors of: >rnbqkbnr/ppp2ppp/8/3pp3/3PP3/8/PPP2PPP/RNBQKBNR b KQkq e6 0 0 > >in a loop, 2000000 times and keep track of how long it takes... then compute >nodes/sec. Vincent gets 15 million in the same position with a slower >processor! > I thought Vincent was doing this on a PII-450. Is than a lot slower than the K6-3 450? >I have profiled my code and determined that the bottleneck is in my generation >routines (as opposed to AddMove) So my question is this: I currently generate >moves for a specific piece something like this: > > // > // Figure out how the rook can move. > // > for (iDir = 0; iDir < 4; iDir++) > { > iTarg = x + iDelta[iDir]; > while (ONBOARD(iTarg)) > { > iTargContents = PIECE_AT(psPos, iTarg); > > // > // The rook can move if the square is empty. > // > if (IS_EMPTY(iTargContents)) > { > AddMove(psPos, x, iTarg); > } > > // > // Or if there is an enemy piece there to capture. > // > else > { > if (COLOR(iTargContents) != COLOR(iRook)) > { > AddMove(psPos, x, iTarg); > } > break; > } > iTarg += iDelta[iDir]; > } > } > >However I see in GNUChess they use some kind of precomputed table. I am not >sure I understand exactly how this works. Also, does anyone else have any other >tricks to speed up move generation? I know that in the long run this really >doesn't make TOO much of a difference but I want my new version to be lean and >mean... > >As always, I appreciate the help. >Scott I'm getting 8.1 million pseudo-legal moves generated per second on Vincent's position on a P6/200 with my latest move generator--a non-rotated bitboard move generator. This move generator does scan the board for slider move generation, so I guess it's really a sort of hybrid. (My previous (0x88) move generator goes at about 5.8 Mm/s.) I suppose that I might get 18 Mm/s on a 450 MHz processor, but I haven't tried yet. The difference in cache could make a difference, so multiplying by 2.25 probably isn't quite right... Mostly what I do (to get higher performance) is to unroll loops, do as little if-testing as I can get away with, and (attempt to) reduce the number of memory accesses. Many of these things tend to trade code compactness for speed, so there is a point at which the return can be negative. In my 0x88 move generator I use piece lists so that I don't have to scan the board for the STM's pieces and a switch statement to handle move generation for different piece types rather than having arrays of offset values and so forth indexed by piece type or direction. This means the move generator isn't very compact, but it avoids doing array lookups for offset values by directly embedding them in the code: while( *piece ) { switch( piece_type( *piece) ) { case PAWN: // move gen code for pawn... break; ... case BISHOP: { int from = piece_coord( *piece); // NE dir int to = from + 17; while( (to & 0x88) == 0 ) { if( board[to] ) break; *movep++ = from | (to << 7); to += 17 } // NW dir // etc... break; } ... case KING: // move gen code for king break; } piece++; } Another thing I do is have completely separate capture and non-capture move generators. This actually hurts on Vincent's test since I have the overhead of calling two generators and must do many things twice, but in actual use the capture generator is called far more often than the non-capture generator. This is because I use only the capture generator in the quiescence search, and in the full width search a capture quite often is the cause of a cutoff preventing the non-capture generator from getting called... So, it likely saves some cycles to separate the move generation like this. It also makes move ordering easier. -Dan.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.