Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: planning a SSE-optimized chess engine

Author: Aart J.C. Bik

Date: 11:17:01 01/13/05

Go up one level in this thread


Hi Gerd,

Thanks for your insights! Well, vectorization in the Intel compiler is my
specialty :-). If you want to quickly learn more about all switches and pragmas
related to vectorization please refer to the online IDS article at
http://www.intel.com/cd/ids/developer/asmo-na/eng/65774.htm. If you are
interested in much more details, please also allow me to promote my book on this
subject:

       The Software Vectorization Handbook. Intel Press, June 2004.
       http://www.intel.com/intelpress/sum_vmmx.htm

Having said that it would be nice if I could show straightforward vectorization
of your code. Alas, things are not that simple (and I hope to get new insights
in this forum). Let’s start with a slight simplification (pre-compute the shift
factors and use a 32-bit bitboard):

unsigned int bits32[64];  /* precomputed shifts */

int dotProduct32(unsigned int bb, unsigned char weight[])
{
 int i;
 unsigned int sum = 0;
#pragma vector aligned   /* <- used assuming weight is 16-byte aligned */
 for (i=0; i < 32; i++) {
    if (bb & bits32[i]) sum += weight[i];
 }
 return sum;
}

This will vectorize using the Intel compiler (also note that your “hint” on
masking the reduction is not required):

[C:/temp] icl –Fa –Qunroll0 -nologo -QxP -c dot32.c
dot32.c
dot32.c(10) : (col. 2) remark: LOOP WAS VECTORIZED.

In its “rerolled” form (for simplicity I used –Qunroll0), the generated code
looks like:

         <setup>
L:      movdqa    xmm4, XMMWORD PTR _bits32[0+eax*4]
        pand      xmm4, xmm0
        pcmpeqd   xmm4, xmm1
        movd      xmm3, DWORD PTR [eax+edx]
        punpcklbw xmm3, xmm1
        punpcklwd xmm3, xmm1
        add       eax, 4
        cmp       eax, 32
        pandn     xmm4, xmm3
        paddd     xmm2, xmm4
        jb        L
        <compute partial sums>


Your 64-bit version gives my vectorizer more headaches however. Let me ponder
about this some more to see what can be improved in the Intel compiler (looks
like I am back at my job rather than focusing on the chess engine :-).

Sincerely,

Aart Bik
http://www.aartbik.com/



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.