Author: Vincent Diepeveen
Date: 19:53:39 02/01/01
Go up one level in this thread
On February 01, 2001 at 22:15:50, Larry Griffiths wrote: >On February 01, 2001 at 21:24:13, Vincent Diepeveen wrote: > >>On February 01, 2001 at 07:12:07, Dan Newman wrote: >> >>>On February 01, 2001 at 03:36:08, David Rasmussen wrote: >>> >>>>On January 31, 2001 at 07:15:59, Dan Newman wrote: >>>> >>>>>On January 30, 2001 at 06:26:17, David Rasmussen wrote: >>>>> >>>>>It just means I don't use rotated bitboards. I have a bitboard for each >>>>>different piece type (12 bitboards) + occupied square bitboards for each >>>>>color (2) + an occupied square bitboard (1). This means make() and undo() >>>>>are a bit cheaper since I don't have to update rotated bitboards. Also, >>>>>there are a few large data structures (arrays of bitboards) that aren't >>>>>needed as well--so fewer memory hits. >>>>> >>>>>I treat the non-sliding pieces more or less like any other bitboard program, >>>>>but for sliding pieces I occasionally resort to scanning the board. I guess >>>>>it might be thought of as a sort of hybrid of bitboard and mailbox. >>>>> >>>>>I suspect (but don't have any hard data) that this is cheaper than doing >>>>>rotated bitboards, and it's also much simpler to implement. >>>>> >>>>>OTOH, I think Bas (Hamstra) may have switched to rotated bitboards and >>>>>found them to be faster... >>>>> >>>>>-Dan. >>>> >>>>OK, so essentially you don't have any smart way of calculating the attackboard >>>>of sliding pieces?? >>>> >>>>I mean, the normal method of extracting the state of the file, rank or >diagonal, >>>>and using this to quickly calculate an attackboard, cannot be used in a >>>>non-rotated bitboard design. >>>> >>>>Isn't this just worse in all cases compared to rotated bitboards (except for >the >>>>simpler design maybe)? >>> >>>I calculate a "pseudo-attack" bitboard for sliders. (That's a bitboard >>>that has a bit turned on for each enemy piece on which the slider is >>>bearing, ignoring any blocking pieces.) Then I start extracting the >>>coordinates of the "victims". For each such coordinate I test to see if >>>any pieces are blocking the attack using a blocking-mask bitboard which >>>has all the bits turned on for squares that lie between the attacker and >>>victim. If no blockers are there, we have a real victim. >>> >>>I don't think this differs much in cost from rotated bitboards. In >>>some cases it's probably faster (sparse board), in others it's probably >>>slower than rotated bitboards. Even if rotated bitboards happen to be a >>>little faster generating captures, they will certainly slow down make() >>>and undo(). >>> >>>I don't think that it's really easy to figure out which is better, either by >>>reason or by experiment. You can't just count instructions executed--there >>>are memory/cache effects to be considered. Also, the quality of the >>>implementations will vary, and that is likely to have an even bigger effect. >>>So I might add rotated bitboards and find it slows me down--only because of >>>poor design. >>> >>>-Dan. >> >>Could you post please your LastOne() function >>in assembly or if it is in C and also write down what >>processor you use? >> >>Of course only if it's not the same as crafty, if it is >>the same i don't see how you can ignore the macro that >>eats most system time of the entire generation! > >Have a look at mine, Vincent. >label is just a parm so that I can use the code several times in my program. >Count is where the last moves were added to the move list. >bb1 is a bitboard of moves or captures. is that a 32 bits result bitboard with captures for a certain piece? I see you load it in ESI (correct me if i'm wrong, but that's a 32 bits register isn't it?), how to scan the upper 32 bits of the bitboard? >output is where a capture or non-capture movelist starts. in the memory i guess? >I also have a MoveModel that has FromPieceType, FromSquare, and Flags filled in >before using this macro. >tcmovesize is defined as 16 which is the size of each of my movelist entries. >The code basically loads the movemodel into a MMX register, then uses the "bsf" >instruction to get a square number from a capture or move bitboard. This is >or'ed with the MMX register containing the move model and then stored in the >move list. I have a seperate extract for Pawns that adds a value to the >extracted square to get the "from" square. The From and To squares are then >or'ed with the MoveModel and stored into a capture/move list. >Larry. > > > >#define defbbExtractAndOrMoveModel(label,Count,bb1,output)\ > {\ > _ESI = (unsigned long)bb1;\ > _EDX = (unsigned long)output;\ > asm mov ECX,[Count];\ > asm mov EDI,dword ptr [ESI];\ > asm shl ECX,4;\ > asm mov ESI,dword ptr [ESI+4];\ > asm movq mm0,qword ptr [bbTCMoveModel];\ > asm movq mm2,qword ptr [bbTCMoveModel+8];\ > asm label##loop1:;\ > asm bsf EAX,EDI;\ > asm jz label##loop2;\ > asm btr EDI,EAX;\ > asm movd mm1,EAX;\ > asm psllq mm1,32;\ > asm por mm1,mm0;\ > asm movq [EDX+ECX],mm1;\ > asm movq [EDX+ECX+8],mm2;\ > asm add ECX,tcmovesize;\ > asm jmp label##loop1;\ > asm label##loop2:;\ > asm bsf EAX,ESI;\ > asm jz label##Done;\ > asm btr ESI,EAX;\ > asm add EAX,32;\ > asm movd mm1,EAX;\ > asm psllq mm1,32;\ > asm por mm1,mm0;\ > asm movq [EDX+ECX],mm1;\ > asm movq [EDX+ECX+8],mm2;\ > asm add ECX,tcmovesize;\ > asm jmp label##loop2;\ > asm label##Done:;\ > asm shr ECX,4;\ > asm mov [Count],ECX;\ > } Hope you also detect whether you run on a P4 or not, because on P4 the MMX instructions are that pathetic slow executing each that it's not longer smart to use MMX. But i don't see the speedup when compared to crafty when talking about branches here. You use on average 2 JZ instructions for each move. Let's do a very non-educated guess and estimate that from each 2 JZ instructions at least 1 is mispredicted. That means that you lose in number of clocks (at least) on the P3 : 10 P4 : 20 So you waste 10 CLOCKS on a P3!!!!!!! Just to get a single instruction out of a bitboard! Added to that is a lot of extra overhead also, as you use a lot of complex instructions. psllq i need to figure out what it does but perhaps you can already post that!.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.