Author: Landon Rabern
Date: 15:08:06 07/03/01
Go up one level in this thread
On July 03, 2001 at 17:56:44, Dann Corbit wrote:
>On July 03, 2001 at 14:38:27, Landon Rabern wrote:
>
>>On July 03, 2001 at 14:11:06, Dann Corbit wrote:
>>
>>>On July 02, 2001 at 20:03:40, Landon Rabern wrote:
>>>
>>>>Why does he shift the bitBoard down and then xor with x-2 and then shift it back
>>>>up. I would think it would be faster to make a couple lookup tables like this:
>>>>
>>>>bitBoard left[64];
>>>>bitBoard right[64];
>>>>bitBoard j=1;
>>>>
>>>>for(i=0;i<64;i++)
>>>>{
>>>> if(i%8==7)
>>>> left[i]=0;
>>>> else
>>>> left[i]=j<<(i+1);
>>>> if(i%8==0)
>>>> right[i]=0;
>>>> else
>>>> right[i]=j<<(64-i);
>>>>}
>>>>
>>>>and then to get the bits of where the rook can move horizontally just do
>>>>
>>>>for left:
>>>>
>>>>allPieces^(allPieces - left[sq])
>>>>
>>>>for right:
>>>>
>>>>reversePieces^(reversePieces - right[sq])
>>>>
>>>>See the trick with x^(x-2) extends to x^(x-4) , etc.
>>>>
>>>>Then you can just use two seperate while loops to pull out the actual moves,
>>>>there will be an extra & operation incurred since you need to mask with what you
>>>>are attacking seperately.
>>>>
>>>>So you use only a couple K of memory instead of the 128K used by the regular
>>>>bitBoard rookMoves[64][256]. Normally you need to do an & and a >> how much
>>>>slower would you think an ^ and a - would be? What if done using the MMX
>>>>registers?
>>>
>>>Those operations are about the same speed. If there is a win, it is due to the
>>>reduced memory footprint. Especially if the table spills out of cache, there is
>>>a big performance penalty for a fetch from RAM.
>>>
>>>>Maybe it could be faster for horizontal, but how do you get the vertical
>>>>bitBoard back to being vertical quickly? And how do you get the 45 degree
>>>>rotated back to to where they need to be quickly? hmmm
>>>
>>>Seems like it might actually be a pretty good idea. However, since he is
>>>writing the whole thing in Assembly, I doubt very much if it will ever play
>>>chess very well.
>>>
>>>Twenty thousand lines of C means 40K lines of assembly, at least. None of the
>>>handy-dandy control structures, etc. I suspect that those who have written
>>>assembly language chess programs (whether chess playing engines or chess
>>>position solvers) are either sorry they did it that way or will be sorry later.
>>>I base that comment on experience, from having written big piles of assembly,
>>>only to see a new chip come out and all that effort [nearly] a complete waste.
>>
>>
>>I think I can get all moves this way for horizontal,vertical, and diagonals.
>>Just have to keep a few new bitboards. So these are the ones needed:
>>rot0,rot45,rot90,rot135,rot180,rot225,rot270,rot335. Thats 4 new ones from the
>>rot0, rot45, rot90, rot135 that I use now.
>>
>>To get the moves, just do the same as with the horizontal( the n ^ (n- tab[sq]),
>>and then and that with the complement of the correctly rotated bitBoard for this
>>direction. Then you will have all the moves, but of course they will not be in
>>the correct squares since it is rotated. But each different direction is in its
>>own while loop, so you know that it is rotated and by how much, so you can do
>>the conversion with a simple integer subtraction.
>>
>>This only gets us moves though since we don't have rotated board of blackPieces
>>and whitePieces to AND with to get captures. Since the speed of move generation
>>does not matter compared to that of capture generation, this probably would not
>>be very fast.
>>
>>Does anyone see a way to get captures by just having these 4 extra bitBoards? I
>>guess could do the move generation thing to find where the first piece in the
>>direction and the check the next square off by our delta. Since each direction
>>is done seperately you can find the last square before hitting a piece/edge with
>>just one LSB or MSB call. But then the check of
>>if (BLACK_PIECE(board->square[sq+delta]))
>> add to move list
>>
>>is an extra branch in there, but you have generated all captures without loading
>>those big 128K lookup tables into memory.
>>
>>
>>Anyone have an idea if this will be slower/faster? I would think move
>>generation would be faster, but capture generation I don't know. Also, the
>>extra 4 updates for the new bitBoards could be a factor in the speed of this.
>
>Missed branch predictions are very expensive. A jump table is a good
>alternative to think about. The array and pointer version of the test routine
>below are between three and four times faster than the switch for some
>combinations of compiler/machine. Pretty staggering, when you look at the cost
>of those missed branch predictions being far more expensive than the floating
>point library calls.
>
>#include<math.h>
>#include<stdio.h>
>
>typedef double (*f_t) (double);
>static f_t f[] = {log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh,
>0};
>
>static double accum0 = 0;
>static double accum1 = 0;
>static double accum2 = 0;
>
>
>void arr(void)
>{
> int i;
> double d = 0;
> for (i = 0; f[i]; i++) {
> d += f[i] (0.5);
> }
> accum0 += d;
>}
>
>void poi(void)
>{
> f_t *flist = f;
> double d = 0;
> while (*flist) {
> f_t ff = *flist;
> d += ff(0.5);
> flist++;
> }
> accum1 += d;
>}
>
>void swi(void)
>{
> int i;
> double d = 0;
> for (i = 0; f[i]; i++) {
> switch (i) {
> case 0:
> d += f[0] (0.5);
> break;
> case 1:
> d += f[1] (0.5);
> break;
> case 2:
> d += f[2] (0.5);
> break;
> case 3:
> d += f[3] (0.5);
> break;
> case 4:
> d += f[4] (0.5);
> break;
> case 5:
> d += f[5] (0.5);
> break;
> case 6:
> d += f[6] (0.5);
> break;
> case 7:
> d += f[7] (0.5);
> break;
> case 8:
> d += f[8] (0.5);
> break;
> case 9:
> d += f[9] (0.5);
> break;
> default:
> break;
> }
> }
> accum2 += d;
>}
>
>int main(void)
>{
> long i;
> for (i = 0; i < 1000000; i++) {
> arr();
> poi();
> swi();
> }
> printf("%.20g, %.20g, %.20g\n", accum0, accum1, accum2);
> return 0;
>}
Yah, that extra branch is the thing I am most concerned with. Need to find some
way to get the captures without having to do a branch and not having to add more
bitBoards either. I think this could be a good improvement, hoping for maybe
10%, my move generation takes about 15% of the total time.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.