Author: Gerd Isenberg
Date: 15:22:00 01/21/03
Go up one level in this thread
On January 21, 2003 at 15:27:26, David Rasmussen wrote:
>What I was hoping for was x86 (Athlon XP, primarily) functions for _all_
>or most of the below simple inline functions, since it seems that MSVC and Intel
>generates horrible code (function calls for shifting etc.!) for these
>fundamental functions. They still manage to be a lot faster than gcc, Borland
>and Sun for some reasons.
>
>For example: (see below)
>
>On January 19, 2003 at 14:12:35, Gerd Isenberg wrote:
>
>>>
>>>(well these are some constants, but maybe they're intereting to some)
>>>const BitBoard lightSquares = 0x55aa55aa | (BitBoard(0x55aa55aa) << 32);
>>>const BitBoard darkSquares = ~lightSquares;
>>>const BitBoard center = 0x18000000 | BitBoard(0x00000018) << 32;
>>>
>>>//INLINE BitBoard Mask(Square square) { return BitBoard(1) << square; }
>
>This code is two loads (loads the bitboard) and a function call(!) on MSVC and
>Intel. So, what is a fast assembly function to do the same? I am sure it can be
>done faster.
Hi David,
yes, see
http://www.talkchess.com/forums/1/message.html?278421
with MSC there is no way to skip the unnecessary store/load prefix with inlined
ams functions. So small lookup tables in C is probably the fastest.
>
>>>INLINE BitBoard Mask(Square square) { return mask[square]; }
>>>INLINE BitBoard RankMask(Rank rank) { return rankMask[rank]; }
>
>rankmask is (as expected) a mask of all 1's at the relevant rank, so it's
>11111111 shifted rank*8 times to the left, if rank is zero-indexed. This can
>probably be done faster than a memory lookup too, if it's not put in the hands
>of MSVC and Intel, which would probably just do the shift with a function call.
>So, again: A faster assembly function should be possible. Please help, assembly
>programmers!
>
>Pretty please?
Learning assembler by yourself or waiting for hammer ;-)
>
<snip>
>I'm sure it's fine, but what I would like is x86 assembly functions for these
>instead of C++ functions since the compilers I've tried generates lousy code. >I'm asking all you assembly programmers to help me (and others making bitboard
>programs on x86), because I'm not much of an assembly programmer myself.
See the link above.
>
>For example:
>
>>>I don't remember where I got these, I probably stole them or copied them from
>>>discussions on CCC. Maybe they can be even faster:
>>>
INLINE int FirstBit(const BitBoard bitboard)
{
__asm
{
bsf eax,[bitboard+4]
xor eax, 32
bsf eax,[bitboard]
}
}
INLINE int LastBit(const BitBoard bitboard)
{
__asm
{
bsr eax,[bitboard]
sub eax,32
bsr eax,[bitboard+4]
add eax,32
}
}
>>
<snip>
>
>I have no idea. I didn't make those functions, I just stole them or borrowed
>them. So if they can be optimized, then please do! But it's not only these
>assembly language functions that I want you to look at, I want functions for
>some or all of the above inline functions that are now in C++.
>
Simply replace them.
>>
>>
>>>
>>>INLINE int PopCount(BitBoard a) // MMX
>>>{
>>> static const __int64 C55 = 0x5555555555555555;
>>> static const __int64 C33 = 0x3333333333333333;
>>> static const __int64 C0F = 0x0F0F0F0F0F0F0F0F;
>>>
>>> __asm {
>>> movd mm0, word ptr a;
>>> punpckldq mm0, word ptr a + 4;
>>> movq mm1, mm0;
>>> psrld mm0, 1;
>>> pand mm0, [C55];
>>> psubd mm1, mm0;
>>> movq mm0, mm1;
>>> psrld mm1, 2;
>>> pand mm0, [C33];
>>> pand mm1, [C33];
>>> paddd mm0, mm1;
>>> movq mm1, mm0;
>>> psrld mm0, 4;
>>> paddd mm0, mm1;
>>> pand mm0, [C0F];
>>> pxor mm1, mm1;
>>> psadbw mm0, mm1;
>>> movd eax, mm0;
>>> emms; femms for athlon is faster
>>// or skip emms at all, if you don't use float
>>> }
>>>}
>>
>>
>>I found this modified one slightly faster (saves a few bytes):
>>
>>Regards,
>>Gerd
>>
>>---------------------------------------------------------------
>>
>>struct SBitCountConsts
>>{
>> BitBoard C55;
>> BitBoard C33;
>> BitBoard C0F;
>> ...
>>};
>>extern const SBitCountConsts BitCountConsts;
>>
>>__forceinline
>>int PopCount (BitBoard bb)
>>{
>> __asm
>> {
>> movd mm0, word ptr bb
>> punpckldq mm0, word ptr bb + 4
>> lea eax, [BitCountConsts]
>> movq mm1, mm0
>> psrld mm0, 1
>> pand mm0, [eax].C55
>> psubd mm1, mm0
>> movq mm0, mm1
>> psrld mm1, 2
>> pand mm0, [eax].C33
>> pand mm1, [eax].C33
>> paddd mm0, mm1
>> movq mm1, mm0
>> psrld mm0, 4
>> paddd mm0, mm1
>> pand mm0, [eax].C0F
>> pxor mm1, mm1
>> psadbw mm0, mm1
>> movd eax, mm0
>> }
>>}
>
>OK, I will try it (if I can understand how to use it)
>
struct SBitCountConsts.. is C. You simply have to initialize the struct
somewhere in a c-file with appropriate constants. For the rest simply replace
the asm-body.
Gerd
>Please help me! I suck at assembler!
>
>/David
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.