Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: that hyperbola project thing

Author: Landon Rabern

Date: 15:08:06 07/03/01

Go up one level in this thread


On July 03, 2001 at 17:56:44, Dann Corbit wrote:

>On July 03, 2001 at 14:38:27, Landon Rabern wrote:
>
>>On July 03, 2001 at 14:11:06, Dann Corbit wrote:
>>
>>>On July 02, 2001 at 20:03:40, Landon Rabern wrote:
>>>
>>>>Why does he shift the bitBoard down and then xor with x-2 and then shift it back
>>>>up.  I would think it would be faster to make a couple lookup tables like this:
>>>>
>>>>bitBoard left[64];
>>>>bitBoard right[64];
>>>>bitBoard j=1;
>>>>
>>>>for(i=0;i<64;i++)
>>>>{
>>>>  if(i%8==7)
>>>>    left[i]=0;
>>>>  else
>>>>    left[i]=j<<(i+1);
>>>>  if(i%8==0)
>>>>    right[i]=0;
>>>>  else
>>>>    right[i]=j<<(64-i);
>>>>}
>>>>
>>>>and then to get the bits of where the rook can move horizontally just do
>>>>
>>>>for left:
>>>>
>>>>allPieces^(allPieces - left[sq])
>>>>
>>>>for right:
>>>>
>>>>reversePieces^(reversePieces - right[sq])
>>>>
>>>>See the trick with x^(x-2) extends to x^(x-4) , etc.
>>>>
>>>>Then you can just use two seperate while loops to pull out the actual moves,
>>>>there will be an extra & operation incurred since you need to mask with what you
>>>>are attacking seperately.
>>>>
>>>>So you use only a couple K of memory instead of the 128K used by the regular
>>>>bitBoard rookMoves[64][256].  Normally you need to do an & and a >> how much
>>>>slower would you think an ^ and a - would be?  What if done using the MMX
>>>>registers?
>>>
>>>Those operations are about the same speed.  If there is a win, it is due to the
>>>reduced memory footprint.  Especially if the table spills out of cache, there is
>>>a big performance penalty for a fetch from RAM.
>>>
>>>>Maybe it could be faster for horizontal, but how do you get the vertical
>>>>bitBoard back to being vertical quickly?  And how do you get the 45 degree
>>>>rotated back to to where they need to be quickly?  hmmm
>>>
>>>Seems like it might actually be a pretty good idea.  However, since he is
>>>writing the whole thing in Assembly, I doubt very much if it will ever play
>>>chess very well.
>>>
>>>Twenty thousand lines of C means 40K lines of assembly, at least.  None of the
>>>handy-dandy control structures, etc.  I suspect that those who have written
>>>assembly language chess programs (whether chess playing engines or chess
>>>position solvers) are either sorry they did it that way or will be sorry later.
>>>I base that comment on experience, from having written big piles of assembly,
>>>only to see a new chip come out and all that effort [nearly] a complete waste.
>>
>>
>>I think I can get all moves this way for horizontal,vertical, and diagonals.
>>Just have to keep a few new bitboards.  So these are the ones needed:
>>rot0,rot45,rot90,rot135,rot180,rot225,rot270,rot335.  Thats 4 new ones from the
>>rot0, rot45, rot90, rot135 that I use now.
>>
>>To get the moves, just do the same as with the horizontal( the n ^ (n- tab[sq]),
>>and then and that with the complement of the correctly rotated bitBoard for this
>>direction.  Then you will have all the moves, but of course they will not be in
>>the correct squares since it is rotated.  But each different direction is in its
>>own while loop, so you know that it is rotated and by how much, so you can do
>>the conversion with a simple integer subtraction.
>>
>>This only gets us moves though since we don't have rotated board of blackPieces
>>and whitePieces to AND with to get captures.  Since the speed of move generation
>>does not matter compared to that of capture generation, this probably would not
>>be very fast.
>>
>>Does anyone see a way to get captures by just having these 4 extra bitBoards?  I
>>guess could do the move generation thing to find where the first piece in the
>>direction and the check the next square off by our delta.  Since each direction
>>is done seperately you can find the last square before hitting a piece/edge with
>>just one LSB or MSB call.  But then the check of
>>if (BLACK_PIECE(board->square[sq+delta]))
>>  add to move list
>>
>>is an extra branch in there, but you have generated all captures without loading
>>those big 128K lookup tables into memory.
>>
>>
>>Anyone have an idea if this will be slower/faster?  I would think move
>>generation would be faster, but capture generation I don't know.  Also, the
>>extra 4 updates for the new bitBoards could be a factor in the speed of this.
>
>Missed branch predictions are very expensive.  A jump table is a good
>alternative to think about.  The array and pointer version of the test routine
>below are between three and four times faster than the switch for some
>combinations of compiler/machine.  Pretty staggering, when you look at the cost
>of those missed branch predictions being far more expensive than the floating
>point library calls.
>
>#include<math.h>
>#include<stdio.h>
>
>typedef double  (*f_t) (double);
>static f_t      f[] = {log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh,
>0};
>
>static double   accum0 = 0;
>static double   accum1 = 0;
>static double   accum2 = 0;
>
>
>void            arr(void)
>{
>    int             i;
>    double          d = 0;
>    for (i = 0; f[i]; i++) {
>        d += f[i] (0.5);
>    }
>    accum0 += d;
>}
>
>void            poi(void)
>{
>    f_t            *flist = f;
>    double          d = 0;
>    while (*flist) {
>        f_t             ff = *flist;
>        d += ff(0.5);
>        flist++;
>    }
>    accum1 += d;
>}
>
>void            swi(void)
>{
>    int             i;
>    double          d = 0;
>    for (i = 0; f[i]; i++) {
>        switch (i) {
>        case 0:
>            d += f[0] (0.5);
>            break;
>        case 1:
>            d += f[1] (0.5);
>            break;
>        case 2:
>            d += f[2] (0.5);
>            break;
>        case 3:
>            d += f[3] (0.5);
>            break;
>        case 4:
>            d += f[4] (0.5);
>            break;
>        case 5:
>            d += f[5] (0.5);
>            break;
>        case 6:
>            d += f[6] (0.5);
>            break;
>        case 7:
>            d += f[7] (0.5);
>            break;
>        case 8:
>            d += f[8] (0.5);
>            break;
>        case 9:
>            d += f[9] (0.5);
>            break;
>        default:
>            break;
>        }
>    }
>    accum2 += d;
>}
>
>int             main(void)
>{
>    long            i;
>    for (i = 0; i < 1000000; i++) {
>        arr();
>        poi();
>        swi();
>    }
>    printf("%.20g, %.20g, %.20g\n", accum0, accum1, accum2);
>    return 0;
>}

Yah, that extra branch is the thing I am most concerned with.  Need to find some
way to get the captures without having to do a branch and not having to add more
bitBoards either.  I think this could be a good improvement, hoping for maybe
10%, my move generation takes about 15% of the total time.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.