Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 0x88 and move generator speed

Author: Vincent Diepeveen

Date: 19:43:16 02/02/01

Go up one level in this thread


On February 02, 2001 at 02:57:58, Dan Newman wrote:

>On February 01, 2001 at 21:24:13, Vincent Diepeveen wrote:
>
>>On February 01, 2001 at 07:12:07, Dan Newman wrote:
>>
>>>On February 01, 2001 at 03:36:08, David Rasmussen wrote:
>>>
>>>>On January 31, 2001 at 07:15:59, Dan Newman wrote:
>>>>
>>>>>On January 30, 2001 at 06:26:17, David Rasmussen wrote:
>>>>>
>>>>>It just means I don't use rotated bitboards.  I have a bitboard for each
>>>>>different piece type (12 bitboards) + occupied square bitboards for each
>>>>>color (2) + an occupied square bitboard (1).  This means make() and undo()
>>>>>are a bit cheaper since I don't have to update rotated bitboards.  Also,
>>>>>there are a few large data structures (arrays of bitboards) that aren't
>>>>>needed as well--so fewer memory hits.
>>>>>
>>>>>I treat the non-sliding pieces more or less like any other bitboard program,
>>>>>but for sliding pieces I occasionally resort to scanning the board.  I guess
>>>>>it might be thought of as a sort of hybrid of bitboard and mailbox.
>>>>>
>>>>>I suspect (but don't have any hard data) that this is cheaper than doing
>>>>>rotated bitboards, and it's also much simpler to implement.
>>>>>
>>>>>OTOH, I think Bas (Hamstra) may have switched to rotated bitboards and
>>>>>found them to be faster...
>>>>>
>>>>>-Dan.
>>>>
>>>>OK, so essentially you don't have any smart way of calculating the attackboard
>>>>of sliding pieces??
>>>>
>>>>I mean, the normal method of extracting the state of the file, rank or >diagonal,
>>>>and using this to quickly calculate an attackboard, cannot be used in a
>>>>non-rotated bitboard design.
>>>>
>>>>Isn't this just worse in all cases compared to rotated bitboards (except for >the
>>>>simpler design maybe)?
>>>
>>>I calculate a "pseudo-attack" bitboard for sliders.  (That's a bitboard
>>>that has a bit turned on for each enemy piece on which the slider is
>>>bearing, ignoring any blocking pieces.)  Then I start extracting the
>>>coordinates of the "victims".  For each such coordinate I test to see if
>>>any pieces are blocking the attack using a blocking-mask bitboard which
>>>has all the bits turned on for squares that lie between the attacker and
>>>victim.  If no blockers are there, we have a real victim.
>>>
>>>I don't think this differs much in cost from rotated bitboards.  In
>>>some cases it's probably faster (sparse board), in others it's probably
>>>slower than rotated bitboards.  Even if rotated bitboards happen to be a
>>>little faster generating captures, they will certainly slow down make()
>>>and undo().
>>>
>>>I don't think that it's really easy to figure out which is better, either by
>>>reason or by experiment.  You can't just count instructions executed--there
>>>are memory/cache effects to be considered.  Also, the quality of the
>>>implementations will vary, and that is likely to have an even bigger effect.
>>>So I might add rotated bitboards and find it slows me down--only because of
>>>poor design.
>>>
>>>-Dan.
>>
>>Could you post please your LastOne() function
>>in assembly or if it is in C and also write down what
>>processor you use?
>>
>
>I'm currently (mostly) using a P3 clocked at 980 MHz.
>
>>Of course only if it's not the same as crafty, if it is
>>the same i don't see how you can ignore the macro that
>>eats most system time of the entire generation!
>
>I haven't attempted to be symmetrical, so I just have one function
>called next_one() which uses BSF and is a member function of my
>bitboard class.  This function not only extracts the bit index but
>also clears it.
>
>//
>// This function gives access to the BSF
>// (bit scan forward) opcode and is used
>// to extract bit indices from bitboards.
>//
>#ifdef _MSC_VER
>#pragma warning( push)
>#pragma warning( disable: 4035)
>inline int bsf( uint32 bitpat)
>{
>    __asm bsf eax, dword ptr bitpat
>}
>#pragma warning( pop)
>
>
>    //
>    // Bit index extraction.  Member function of bitboard_t.
>    //
>    int next_one() {
>        if( lower ) {
>            int index = bsf( lower);
>            lower ^= (1ul << index);
>            return index;
>        } else {  // Not testing upper since we won't run this on empty...
>            int index = bsf( upper);
>            upper ^= (1ul << index);
>            return index + 32;
>        }
>    }
>
>The bsf() function has the only asm used in my program--just one line.
>I've tried coding more of next_one() in assembly, but I've only managed
>to slow the program down...
>
>"upper" and "lower" are just the two 32-bit halves of the bitboard.
>
>-Dan.

Ah yes, this is not so bad except for one real slow branch which
might get mispredicted!

Perhaps start to rewrite that one a bit too till there is not
a single 'if then else' left?

Because on average you can easily use 15 extra instructions to
get rid of it!

Even more on the P4 btw, except those shift instructions,
to my amazement they were slower on the P4, but that might change
next release of P4...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.