Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bitboards with BSF/BSR

Author: Wylie Garvin

Date: 17:49:15 12/11/01

Go up one level in this thread


On December 11, 2001 at 20:33:04, Dann Corbit wrote:

>On December 11, 2001 at 20:27:37, Wylie Garvin wrote:
>
>>Hi,
>>
>>  I've noticed some posts where people mention doing things like this:
>>
>>    BSF ecx,dword ptr [foo+4]
>>    ADD ecx,32
>>    BSF ecx,dword ptr [foo]
>>
>>  On the assumption that undefined --> unchanged for PIII, PIV.  While I don't
>>have any evidence to contradict that assumption, it seems like a bad idea to
>>rely on undefined behaviour.
>
>You must be sure that the function is never called with all zero bits.
>
Dann, it sounds like you're agreeing with me.  But I never used the word
"function".  My chess program doesn't contain any "functions".  (The parts I've
written so far don't contain any multiply instructions, either  =)

>>  I point out the conditional CMOVcc instructions which have been supported
>>since around the time of the PPro I think:
>>
>>    BSF ecx,dword ptr [foo+4]
>>    ADD ecx,32
>>    BSF eax,dword ptr [foo]
>>    CMOVZ eax,ecx
>>
>>  This technique will work correctly even if eax happens to be trashed by the
>>BSR instruction if no bits were set.
>>
>>  Also I'd like to mention something that I use to speed up my attack
>>calculations with BSF/BSR...I don't know if it will help you C/C++ programmers,
>>but it's pretty cool in assembly.
>>
>>  Notice that of the 8 rays originating in one half of the board, only 3 of them
>>can possibly cross over to the other side of the board.  So you can reduce that
>>construction above to a single BSR/BSF in the other 5 cases.  My assembly code
>>first decides which half of the board the source square is in, then individually
>>probes the ends of the ray.  To ensure that rays are always terminated, before I
>>start my attack calculations I build a temporary bitboard from the "all pieces"
>>bitboard by ORing in the edges (0xFF818181,818181FF) and then clearing the bit
>>for the source square.  Now I need about 5 BSF/BSRs for rook/queen attacks and 6
>>for bishop/queen attacks.
>>
>>  Also keep in mind that removing a conditional branch in the fashion shown
>>above requires doing the computation for *both* halves of the branch.  In this
>>case it appears to be a win; but if extra computation is required to produce the
>>bitboard, the cost could exceed the cost of a conditional branch.  In general,
>>if the branch is predictable (e.g. if the rays are almost always short, and so
>>almost always stay on the same side of the board) it's likely better not to
>>remove it.
>
>I have found that with the latest Intel compilers, the bit tricks no longer gain
>any speed.  Not the MMX tricks, not the BSF/BSR -- nothing.  The ANSI C is just
>as fast.  Strike that -- faster.

Heh heh.  I respectfully submit that you can *always* produce faster assembly
than any given C compiler.  It's just a matter of how much effort you are
willing to expend...if the compiler does a good job on 95% of the code, then
writing more than 5% in assembly is only for ppl like me who think it's *fun*.

I have the latest Intel compiler, and my only real gripe with it is that it
generates fairly large code to get that speed.  I like the aesthetic appeal of
small assembly code.  By writing my whole program in assembly I can make it
small.  Then I can go back and fix up the parts that VTune says are too slow..

=)
wylie



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.