Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: that hyperbola project thing

Author: Dann Corbit

Date: 14:56:44 07/03/01

Go up one level in this thread


On July 03, 2001 at 14:38:27, Landon Rabern wrote:

>On July 03, 2001 at 14:11:06, Dann Corbit wrote:
>
>>On July 02, 2001 at 20:03:40, Landon Rabern wrote:
>>
>>>Why does he shift the bitBoard down and then xor with x-2 and then shift it back
>>>up.  I would think it would be faster to make a couple lookup tables like this:
>>>
>>>bitBoard left[64];
>>>bitBoard right[64];
>>>bitBoard j=1;
>>>
>>>for(i=0;i<64;i++)
>>>{
>>>  if(i%8==7)
>>>    left[i]=0;
>>>  else
>>>    left[i]=j<<(i+1);
>>>  if(i%8==0)
>>>    right[i]=0;
>>>  else
>>>    right[i]=j<<(64-i);
>>>}
>>>
>>>and then to get the bits of where the rook can move horizontally just do
>>>
>>>for left:
>>>
>>>allPieces^(allPieces - left[sq])
>>>
>>>for right:
>>>
>>>reversePieces^(reversePieces - right[sq])
>>>
>>>See the trick with x^(x-2) extends to x^(x-4) , etc.
>>>
>>>Then you can just use two seperate while loops to pull out the actual moves,
>>>there will be an extra & operation incurred since you need to mask with what you
>>>are attacking seperately.
>>>
>>>So you use only a couple K of memory instead of the 128K used by the regular
>>>bitBoard rookMoves[64][256].  Normally you need to do an & and a >> how much
>>>slower would you think an ^ and a - would be?  What if done using the MMX
>>>registers?
>>
>>Those operations are about the same speed.  If there is a win, it is due to the
>>reduced memory footprint.  Especially if the table spills out of cache, there is
>>a big performance penalty for a fetch from RAM.
>>
>>>Maybe it could be faster for horizontal, but how do you get the vertical
>>>bitBoard back to being vertical quickly?  And how do you get the 45 degree
>>>rotated back to to where they need to be quickly?  hmmm
>>
>>Seems like it might actually be a pretty good idea.  However, since he is
>>writing the whole thing in Assembly, I doubt very much if it will ever play
>>chess very well.
>>
>>Twenty thousand lines of C means 40K lines of assembly, at least.  None of the
>>handy-dandy control structures, etc.  I suspect that those who have written
>>assembly language chess programs (whether chess playing engines or chess
>>position solvers) are either sorry they did it that way or will be sorry later.
>>I base that comment on experience, from having written big piles of assembly,
>>only to see a new chip come out and all that effort [nearly] a complete waste.
>
>
>I think I can get all moves this way for horizontal,vertical, and diagonals.
>Just have to keep a few new bitboards.  So these are the ones needed:
>rot0,rot45,rot90,rot135,rot180,rot225,rot270,rot335.  Thats 4 new ones from the
>rot0, rot45, rot90, rot135 that I use now.
>
>To get the moves, just do the same as with the horizontal( the n ^ (n- tab[sq]),
>and then and that with the complement of the correctly rotated bitBoard for this
>direction.  Then you will have all the moves, but of course they will not be in
>the correct squares since it is rotated.  But each different direction is in its
>own while loop, so you know that it is rotated and by how much, so you can do
>the conversion with a simple integer subtraction.
>
>This only gets us moves though since we don't have rotated board of blackPieces
>and whitePieces to AND with to get captures.  Since the speed of move generation
>does not matter compared to that of capture generation, this probably would not
>be very fast.
>
>Does anyone see a way to get captures by just having these 4 extra bitBoards?  I
>guess could do the move generation thing to find where the first piece in the
>direction and the check the next square off by our delta.  Since each direction
>is done seperately you can find the last square before hitting a piece/edge with
>just one LSB or MSB call.  But then the check of
>if (BLACK_PIECE(board->square[sq+delta]))
>  add to move list
>
>is an extra branch in there, but you have generated all captures without loading
>those big 128K lookup tables into memory.
>
>
>Anyone have an idea if this will be slower/faster?  I would think move
>generation would be faster, but capture generation I don't know.  Also, the
>extra 4 updates for the new bitBoards could be a factor in the speed of this.

Missed branch predictions are very expensive.  A jump table is a good
alternative to think about.  The array and pointer version of the test routine
below are between three and four times faster than the switch for some
combinations of compiler/machine.  Pretty staggering, when you look at the cost
of those missed branch predictions being far more expensive than the floating
point library calls.

#include<math.h>
#include<stdio.h>

typedef double  (*f_t) (double);
static f_t      f[] = {log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh,
0};

static double   accum0 = 0;
static double   accum1 = 0;
static double   accum2 = 0;


void            arr(void)
{
    int             i;
    double          d = 0;
    for (i = 0; f[i]; i++) {
        d += f[i] (0.5);
    }
    accum0 += d;
}

void            poi(void)
{
    f_t            *flist = f;
    double          d = 0;
    while (*flist) {
        f_t             ff = *flist;
        d += ff(0.5);
        flist++;
    }
    accum1 += d;
}

void            swi(void)
{
    int             i;
    double          d = 0;
    for (i = 0; f[i]; i++) {
        switch (i) {
        case 0:
            d += f[0] (0.5);
            break;
        case 1:
            d += f[1] (0.5);
            break;
        case 2:
            d += f[2] (0.5);
            break;
        case 3:
            d += f[3] (0.5);
            break;
        case 4:
            d += f[4] (0.5);
            break;
        case 5:
            d += f[5] (0.5);
            break;
        case 6:
            d += f[6] (0.5);
            break;
        case 7:
            d += f[7] (0.5);
            break;
        case 8:
            d += f[8] (0.5);
            break;
        case 9:
            d += f[9] (0.5);
            break;
        default:
            break;
        }
    }
    accum2 += d;
}

int             main(void)
{
    long            i;
    for (i = 0; i < 1000000; i++) {
        arr();
        poi();
        swi();
    }
    printf("%.20g, %.20g, %.20g\n", accum0, accum1, accum2);
    return 0;
}



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.