Author: Dan Newman
Date: 01:16:21 02/03/01
Go up one level in this thread
On February 02, 2001 at 22:43:16, Vincent Diepeveen wrote: >On February 02, 2001 at 02:57:58, Dan Newman wrote: > >>On February 01, 2001 at 21:24:13, Vincent Diepeveen wrote: >> >>>On February 01, 2001 at 07:12:07, Dan Newman wrote: >>> >>>>On February 01, 2001 at 03:36:08, David Rasmussen wrote: >>>> >>>>>On January 31, 2001 at 07:15:59, Dan Newman wrote: >>>>> >>>>>>On January 30, 2001 at 06:26:17, David Rasmussen wrote: >>>>>> >>>>>>It just means I don't use rotated bitboards. I have a bitboard for each >>>>>>different piece type (12 bitboards) + occupied square bitboards for each >>>>>>color (2) + an occupied square bitboard (1). This means make() and undo() >>>>>>are a bit cheaper since I don't have to update rotated bitboards. Also, >>>>>>there are a few large data structures (arrays of bitboards) that aren't >>>>>>needed as well--so fewer memory hits. >>>>>> >>>>>>I treat the non-sliding pieces more or less like any other bitboard program, >>>>>>but for sliding pieces I occasionally resort to scanning the board. I guess >>>>>>it might be thought of as a sort of hybrid of bitboard and mailbox. >>>>>> >>>>>>I suspect (but don't have any hard data) that this is cheaper than doing >>>>>>rotated bitboards, and it's also much simpler to implement. >>>>>> >>>>>>OTOH, I think Bas (Hamstra) may have switched to rotated bitboards and >>>>>>found them to be faster... >>>>>> >>>>>>-Dan. >>>>> >>>>>OK, so essentially you don't have any smart way of calculating the attackboard >>>>>of sliding pieces?? >>>>> >>>>>I mean, the normal method of extracting the state of the file, rank or >diagonal, >>>>>and using this to quickly calculate an attackboard, cannot be used in a >>>>>non-rotated bitboard design. >>>>> >>>>>Isn't this just worse in all cases compared to rotated bitboards (except for >the >>>>>simpler design maybe)? >>>> >>>>I calculate a "pseudo-attack" bitboard for sliders. (That's a bitboard >>>>that has a bit turned on for each enemy piece on which the slider is >>>>bearing, ignoring any blocking pieces.) Then I start extracting the >>>>coordinates of the "victims". For each such coordinate I test to see if >>>>any pieces are blocking the attack using a blocking-mask bitboard which >>>>has all the bits turned on for squares that lie between the attacker and >>>>victim. If no blockers are there, we have a real victim. >>>> >>>>I don't think this differs much in cost from rotated bitboards. In >>>>some cases it's probably faster (sparse board), in others it's probably >>>>slower than rotated bitboards. Even if rotated bitboards happen to be a >>>>little faster generating captures, they will certainly slow down make() >>>>and undo(). >>>> >>>>I don't think that it's really easy to figure out which is better, either by >>>>reason or by experiment. You can't just count instructions executed--there >>>>are memory/cache effects to be considered. Also, the quality of the >>>>implementations will vary, and that is likely to have an even bigger effect. >>>>So I might add rotated bitboards and find it slows me down--only because of >>>>poor design. >>>> >>>>-Dan. >>> >>>Could you post please your LastOne() function >>>in assembly or if it is in C and also write down what >>>processor you use? >>> >> >>I'm currently (mostly) using a P3 clocked at 980 MHz. >> >>>Of course only if it's not the same as crafty, if it is >>>the same i don't see how you can ignore the macro that >>>eats most system time of the entire generation! >> >>I haven't attempted to be symmetrical, so I just have one function >>called next_one() which uses BSF and is a member function of my >>bitboard class. This function not only extracts the bit index but >>also clears it. >> >>// >>// This function gives access to the BSF >>// (bit scan forward) opcode and is used >>// to extract bit indices from bitboards. >>// >>#ifdef _MSC_VER >>#pragma warning( push) >>#pragma warning( disable: 4035) >>inline int bsf( uint32 bitpat) >>{ >> __asm bsf eax, dword ptr bitpat >>} >>#pragma warning( pop) >> >> >> // >> // Bit index extraction. Member function of bitboard_t. >> // >> int next_one() { >> if( lower ) { >> int index = bsf( lower); >> lower ^= (1ul << index); >> return index; >> } else { // Not testing upper since we won't run this on empty... >> int index = bsf( upper); >> upper ^= (1ul << index); >> return index + 32; >> } >> } >> >>The bsf() function has the only asm used in my program--just one line. >>I've tried coding more of next_one() in assembly, but I've only managed >>to slow the program down... >> >>"upper" and "lower" are just the two 32-bit halves of the bitboard. >> >>-Dan. > >Ah yes, this is not so bad except for one real slow branch which >might get mispredicted! > >Perhaps start to rewrite that one a bit too till there is not >a single 'if then else' left? > >Because on average you can easily use 15 extra instructions to >get rid of it! > >Even more on the P4 btw, except those shift instructions, >to my amazement they were slower on the P4, but that might change >next release of P4... I can't remember if I actually tried eliminating that branch or not since it causes a little code bloat, but I did think about it. If I tried it, then it probably slowed me down a bit, else I would have it in my code now... The idea I had is that once the lower 32 bits is exhausted of one-bits, then you really don't need to test it anymore. All the code that uses next_one() would be duplicated so that we could use a next_lower_one() and a next_upper_one() function. Instead of while( victims.exist() ) int to = victims.next_one(); .... } I'd use while( victims.lower_exist() ) { int to = victims.next_lower_one(); .... } while( victims.upper_exist() ) { int to = victims.next_upper_one(); .... } This would double the size of movegen(), so it might lose all of the gain by increasing its cache footprint--not that it's really huge... -Dan.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.