Author: Dan Newman
Date: 01:16:21 02/03/01
Go up one level in this thread
On February 02, 2001 at 22:43:16, Vincent Diepeveen wrote:
>On February 02, 2001 at 02:57:58, Dan Newman wrote:
>
>>On February 01, 2001 at 21:24:13, Vincent Diepeveen wrote:
>>
>>>On February 01, 2001 at 07:12:07, Dan Newman wrote:
>>>
>>>>On February 01, 2001 at 03:36:08, David Rasmussen wrote:
>>>>
>>>>>On January 31, 2001 at 07:15:59, Dan Newman wrote:
>>>>>
>>>>>>On January 30, 2001 at 06:26:17, David Rasmussen wrote:
>>>>>>
>>>>>>It just means I don't use rotated bitboards. I have a bitboard for each
>>>>>>different piece type (12 bitboards) + occupied square bitboards for each
>>>>>>color (2) + an occupied square bitboard (1). This means make() and undo()
>>>>>>are a bit cheaper since I don't have to update rotated bitboards. Also,
>>>>>>there are a few large data structures (arrays of bitboards) that aren't
>>>>>>needed as well--so fewer memory hits.
>>>>>>
>>>>>>I treat the non-sliding pieces more or less like any other bitboard program,
>>>>>>but for sliding pieces I occasionally resort to scanning the board. I guess
>>>>>>it might be thought of as a sort of hybrid of bitboard and mailbox.
>>>>>>
>>>>>>I suspect (but don't have any hard data) that this is cheaper than doing
>>>>>>rotated bitboards, and it's also much simpler to implement.
>>>>>>
>>>>>>OTOH, I think Bas (Hamstra) may have switched to rotated bitboards and
>>>>>>found them to be faster...
>>>>>>
>>>>>>-Dan.
>>>>>
>>>>>OK, so essentially you don't have any smart way of calculating the attackboard
>>>>>of sliding pieces??
>>>>>
>>>>>I mean, the normal method of extracting the state of the file, rank or >diagonal,
>>>>>and using this to quickly calculate an attackboard, cannot be used in a
>>>>>non-rotated bitboard design.
>>>>>
>>>>>Isn't this just worse in all cases compared to rotated bitboards (except for >the
>>>>>simpler design maybe)?
>>>>
>>>>I calculate a "pseudo-attack" bitboard for sliders. (That's a bitboard
>>>>that has a bit turned on for each enemy piece on which the slider is
>>>>bearing, ignoring any blocking pieces.) Then I start extracting the
>>>>coordinates of the "victims". For each such coordinate I test to see if
>>>>any pieces are blocking the attack using a blocking-mask bitboard which
>>>>has all the bits turned on for squares that lie between the attacker and
>>>>victim. If no blockers are there, we have a real victim.
>>>>
>>>>I don't think this differs much in cost from rotated bitboards. In
>>>>some cases it's probably faster (sparse board), in others it's probably
>>>>slower than rotated bitboards. Even if rotated bitboards happen to be a
>>>>little faster generating captures, they will certainly slow down make()
>>>>and undo().
>>>>
>>>>I don't think that it's really easy to figure out which is better, either by
>>>>reason or by experiment. You can't just count instructions executed--there
>>>>are memory/cache effects to be considered. Also, the quality of the
>>>>implementations will vary, and that is likely to have an even bigger effect.
>>>>So I might add rotated bitboards and find it slows me down--only because of
>>>>poor design.
>>>>
>>>>-Dan.
>>>
>>>Could you post please your LastOne() function
>>>in assembly or if it is in C and also write down what
>>>processor you use?
>>>
>>
>>I'm currently (mostly) using a P3 clocked at 980 MHz.
>>
>>>Of course only if it's not the same as crafty, if it is
>>>the same i don't see how you can ignore the macro that
>>>eats most system time of the entire generation!
>>
>>I haven't attempted to be symmetrical, so I just have one function
>>called next_one() which uses BSF and is a member function of my
>>bitboard class. This function not only extracts the bit index but
>>also clears it.
>>
>>//
>>// This function gives access to the BSF
>>// (bit scan forward) opcode and is used
>>// to extract bit indices from bitboards.
>>//
>>#ifdef _MSC_VER
>>#pragma warning( push)
>>#pragma warning( disable: 4035)
>>inline int bsf( uint32 bitpat)
>>{
>> __asm bsf eax, dword ptr bitpat
>>}
>>#pragma warning( pop)
>>
>>
>> //
>> // Bit index extraction. Member function of bitboard_t.
>> //
>> int next_one() {
>> if( lower ) {
>> int index = bsf( lower);
>> lower ^= (1ul << index);
>> return index;
>> } else { // Not testing upper since we won't run this on empty...
>> int index = bsf( upper);
>> upper ^= (1ul << index);
>> return index + 32;
>> }
>> }
>>
>>The bsf() function has the only asm used in my program--just one line.
>>I've tried coding more of next_one() in assembly, but I've only managed
>>to slow the program down...
>>
>>"upper" and "lower" are just the two 32-bit halves of the bitboard.
>>
>>-Dan.
>
>Ah yes, this is not so bad except for one real slow branch which
>might get mispredicted!
>
>Perhaps start to rewrite that one a bit too till there is not
>a single 'if then else' left?
>
>Because on average you can easily use 15 extra instructions to
>get rid of it!
>
>Even more on the P4 btw, except those shift instructions,
>to my amazement they were slower on the P4, but that might change
>next release of P4...
I can't remember if I actually tried eliminating that branch or not
since it causes a little code bloat, but I did think about it. If I
tried it, then it probably slowed me down a bit, else I would have it
in my code now...
The idea I had is that once the lower 32 bits is exhausted of one-bits,
then you really don't need to test it anymore. All the code that uses
next_one() would be duplicated so that we could use a next_lower_one()
and a next_upper_one() function.
Instead of
while( victims.exist() )
int to = victims.next_one();
....
}
I'd use
while( victims.lower_exist() ) {
int to = victims.next_lower_one();
....
}
while( victims.upper_exist() ) {
int to = victims.next_upper_one();
....
}
This would double the size of movegen(), so it might lose all of the
gain by increasing its cache footprint--not that it's really huge...
-Dan.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.