Computer Chess Club Archives


Search

Terms

Messages

Subject: How to get nextMove O(N)

Author: Gerd Isenberg

Date: 15:33:36 07/23/04


Is your getNextMove O(N**2)?

Some chess-programs assign score values to generated moves, based on SEE,
history, piece square tables and other heuristics. In upper search regions, far
from the leaves, where move sorting is not a hot spot, one may even use a more
sophisticated scoring for best branching factor.

If a program generates all captures (in quiescense) and all moves near the
leaves and scores it, the getNextMove-routine usually traverses the remaining
moves, looking for the one with the best score.

Even if generating and scoring is more expensive, than picking up the moves in
best order, there are even implementations sorting moves by qsort or similar, to
avoid further overhead. Since early cutoffs may occur, that seems not to be a
good idea, copying moves around for nothing. But at some frontier all-nodes,
comparing scores during getNextMove, one may notice the O(N**2) behaviour due to
N/2*(N-1) compares with some memory traffic. "Random" score compares may lead to
some miss-predictions as well, oups there is a better one.

OTOH there are all-Nodes with only "bad" moves, where sorting doesn't matter at
all. And there are approaches to generate moves in stages, e.g. hashmove and
winning captures and checks...

Anyway, inspired by the recent max-discussion and my SIMD obsession i like to
share some brain storming, specially intended for fast bean counters, where
profiling indicates a few percent in picking up the best move from a sorted move
list ;-)

------------------------------------------------------------------------------

SIMD to the max or how to get getNextMove O(N) ?

Simply by a "brute force" SSE2-SIMD approach with k*N*64 for N <= 64 compares
allways, instead of K*N/2*(N-1), but with k << K.

Is that really worth?
If you prefer portability and no assembly or intrinsics at all - no, stop
reading! If you plan to write a dedicated program for architectures with vector
instructions, like SSE2, AltiVec and Itanium - probably yes, at least some fun.

Following routine (P4,AMD64) finds the maximum signed 16-bit number (short int)
in an array of 64, rather fast. Thanks to the PMAXSW - Packed Maximum Signed
Words - instruction. Something like 9.5ns best case (21 cycles or a few more) on
my Athlon64 2.2GHz.

int getMaxOf64(short int s64[] /* aligned 16 */ )
{
    __asm
    {
        mov     eax,  [s64]         ; or rax in 64-bit mode
        movdqa  xmm1, [eax+0*16]    ; short  0- 7
        movdqa  xmm0, [eax+1*16]    ; short  8-15
        pmaxsw  xmm1, [eax+2*16]    ; short 16-23
        pmaxsw  xmm0, [eax+3*16]    ; short 24-31
        pmaxsw  xmm1, [eax+4*16]    ; short 32-39
        pmaxsw  xmm0, [eax+5*16]    ; short 40-47
        pmaxsw  xmm1, [eax+6*16]    ; short 48-55
        pmaxsw  xmm0, [eax+7*16]    ; short 56-63
        pmaxsw  xmm0, xmm1
        ; horizontal max of all eight words
        movdqa  xmm1, xmm0
        psrldq  xmm0, 2       ; one word right
        pmaxsw  xmm0, xmm1
        movdqa  xmm1, xmm0
        psrldq  xmm0, 4       ; two words right
        pmaxsw  xmm0, xmm1
        movdqa  xmm1, xmm0
        psrldq  xmm0, 8       ; four words right
        pmaxsw  xmm0, xmm1
        pextrw  eax, xmm0, 0  ; extract the maximum word
        cwde                  ; sign extend 16 to 32 bits
    }
}

What about the idea, to get the best index from up to 64 generated and scored
(SEE, history, psqt...) moves. The move list of this approach consists of two
arrays, allocated on the heap or global ones. One contain the moves (16..128bit
per move, depending on the implementation), the other the associated move-score
and bookholding information (signed 16 bit).

Each node contains an at least 16-bit aligned start "nextScore" pointer or an
appropriate index inside the move list, to index all generated move-scores and
associated moves of this node.

The upper ten bits are the signed move score with a range of -512..+511, since
the six remaining lower bits contain the index itself. Therefore, with two equal
scores the one with the "higher" index becoms best.

Each time after moves are generated, the score array gap to 64 is preinitialized
with 0x8000, taking care whether 64 moves are enough, and eventually
"allocating" a bigger range. Thus 0x8000 == Int16min, indicates no generated
move or move is already made.

If the max function leaves a best score greater Int16min, the correspondent
local index (0..63) of the move inside the move list is available in the lower 6
bits of the best score.

To get the idea, see following code fragments,
which hides the implementation as well:

  while ( (move = getNextMove() ).isMove() )
  {
      makeMove(move);
      ....
  }


with

TMOVE getNextMove()
{
   if ( hasMoreThan64 )
   {   // special handling very seldom
       return getNextMoveMoreThan64();
   }

   // default case <= 64 moves most often predicted correctly
   int idx = getMaxOf64(&moveScores[nextScore]);
   if ( idx > -512*64 )
   {  // move found with max score
      moveScores[nextScore + (idx&63)] = -512*64; // tag as done
      return moveList[nextMove + (idx&63)];
   }
   return nilMove;
}

As you can see, there is no bookholding overhead, other than storing int16min
inside it's score array element. No further move swapping is necessary.

Cheers,
Gerd




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.