Author: Dann Corbit
Date: 14:56:44 07/03/01
Go up one level in this thread
On July 03, 2001 at 14:38:27, Landon Rabern wrote:
>On July 03, 2001 at 14:11:06, Dann Corbit wrote:
>
>>On July 02, 2001 at 20:03:40, Landon Rabern wrote:
>>
>>>Why does he shift the bitBoard down and then xor with x-2 and then shift it back
>>>up. I would think it would be faster to make a couple lookup tables like this:
>>>
>>>bitBoard left[64];
>>>bitBoard right[64];
>>>bitBoard j=1;
>>>
>>>for(i=0;i<64;i++)
>>>{
>>> if(i%8==7)
>>> left[i]=0;
>>> else
>>> left[i]=j<<(i+1);
>>> if(i%8==0)
>>> right[i]=0;
>>> else
>>> right[i]=j<<(64-i);
>>>}
>>>
>>>and then to get the bits of where the rook can move horizontally just do
>>>
>>>for left:
>>>
>>>allPieces^(allPieces - left[sq])
>>>
>>>for right:
>>>
>>>reversePieces^(reversePieces - right[sq])
>>>
>>>See the trick with x^(x-2) extends to x^(x-4) , etc.
>>>
>>>Then you can just use two seperate while loops to pull out the actual moves,
>>>there will be an extra & operation incurred since you need to mask with what you
>>>are attacking seperately.
>>>
>>>So you use only a couple K of memory instead of the 128K used by the regular
>>>bitBoard rookMoves[64][256]. Normally you need to do an & and a >> how much
>>>slower would you think an ^ and a - would be? What if done using the MMX
>>>registers?
>>
>>Those operations are about the same speed. If there is a win, it is due to the
>>reduced memory footprint. Especially if the table spills out of cache, there is
>>a big performance penalty for a fetch from RAM.
>>
>>>Maybe it could be faster for horizontal, but how do you get the vertical
>>>bitBoard back to being vertical quickly? And how do you get the 45 degree
>>>rotated back to to where they need to be quickly? hmmm
>>
>>Seems like it might actually be a pretty good idea. However, since he is
>>writing the whole thing in Assembly, I doubt very much if it will ever play
>>chess very well.
>>
>>Twenty thousand lines of C means 40K lines of assembly, at least. None of the
>>handy-dandy control structures, etc. I suspect that those who have written
>>assembly language chess programs (whether chess playing engines or chess
>>position solvers) are either sorry they did it that way or will be sorry later.
>>I base that comment on experience, from having written big piles of assembly,
>>only to see a new chip come out and all that effort [nearly] a complete waste.
>
>
>I think I can get all moves this way for horizontal,vertical, and diagonals.
>Just have to keep a few new bitboards. So these are the ones needed:
>rot0,rot45,rot90,rot135,rot180,rot225,rot270,rot335. Thats 4 new ones from the
>rot0, rot45, rot90, rot135 that I use now.
>
>To get the moves, just do the same as with the horizontal( the n ^ (n- tab[sq]),
>and then and that with the complement of the correctly rotated bitBoard for this
>direction. Then you will have all the moves, but of course they will not be in
>the correct squares since it is rotated. But each different direction is in its
>own while loop, so you know that it is rotated and by how much, so you can do
>the conversion with a simple integer subtraction.
>
>This only gets us moves though since we don't have rotated board of blackPieces
>and whitePieces to AND with to get captures. Since the speed of move generation
>does not matter compared to that of capture generation, this probably would not
>be very fast.
>
>Does anyone see a way to get captures by just having these 4 extra bitBoards? I
>guess could do the move generation thing to find where the first piece in the
>direction and the check the next square off by our delta. Since each direction
>is done seperately you can find the last square before hitting a piece/edge with
>just one LSB or MSB call. But then the check of
>if (BLACK_PIECE(board->square[sq+delta]))
> add to move list
>
>is an extra branch in there, but you have generated all captures without loading
>those big 128K lookup tables into memory.
>
>
>Anyone have an idea if this will be slower/faster? I would think move
>generation would be faster, but capture generation I don't know. Also, the
>extra 4 updates for the new bitBoards could be a factor in the speed of this.
Missed branch predictions are very expensive. A jump table is a good
alternative to think about. The array and pointer version of the test routine
below are between three and four times faster than the switch for some
combinations of compiler/machine. Pretty staggering, when you look at the cost
of those missed branch predictions being far more expensive than the floating
point library calls.
#include<math.h>
#include<stdio.h>
typedef double (*f_t) (double);
static f_t f[] = {log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh,
0};
static double accum0 = 0;
static double accum1 = 0;
static double accum2 = 0;
void arr(void)
{
int i;
double d = 0;
for (i = 0; f[i]; i++) {
d += f[i] (0.5);
}
accum0 += d;
}
void poi(void)
{
f_t *flist = f;
double d = 0;
while (*flist) {
f_t ff = *flist;
d += ff(0.5);
flist++;
}
accum1 += d;
}
void swi(void)
{
int i;
double d = 0;
for (i = 0; f[i]; i++) {
switch (i) {
case 0:
d += f[0] (0.5);
break;
case 1:
d += f[1] (0.5);
break;
case 2:
d += f[2] (0.5);
break;
case 3:
d += f[3] (0.5);
break;
case 4:
d += f[4] (0.5);
break;
case 5:
d += f[5] (0.5);
break;
case 6:
d += f[6] (0.5);
break;
case 7:
d += f[7] (0.5);
break;
case 8:
d += f[8] (0.5);
break;
case 9:
d += f[9] (0.5);
break;
default:
break;
}
}
accum2 += d;
}
int main(void)
{
long i;
for (i = 0; i < 1000000; i++) {
arr();
poi();
swi();
}
printf("%.20g, %.20g, %.20g\n", accum0, accum1, accum2);
return 0;
}
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.