Author: Gerd Isenberg
Date: 15:22:00 01/21/03
Go up one level in this thread
On January 21, 2003 at 15:27:26, David Rasmussen wrote: >What I was hoping for was x86 (Athlon XP, primarily) functions for _all_ >or most of the below simple inline functions, since it seems that MSVC and Intel >generates horrible code (function calls for shifting etc.!) for these >fundamental functions. They still manage to be a lot faster than gcc, Borland >and Sun for some reasons. > >For example: (see below) > >On January 19, 2003 at 14:12:35, Gerd Isenberg wrote: > >>> >>>(well these are some constants, but maybe they're intereting to some) >>>const BitBoard lightSquares = 0x55aa55aa | (BitBoard(0x55aa55aa) << 32); >>>const BitBoard darkSquares = ~lightSquares; >>>const BitBoard center = 0x18000000 | BitBoard(0x00000018) << 32; >>> >>>//INLINE BitBoard Mask(Square square) { return BitBoard(1) << square; } > >This code is two loads (loads the bitboard) and a function call(!) on MSVC and >Intel. So, what is a fast assembly function to do the same? I am sure it can be >done faster. Hi David, yes, see http://www.talkchess.com/forums/1/message.html?278421 with MSC there is no way to skip the unnecessary store/load prefix with inlined ams functions. So small lookup tables in C is probably the fastest. > >>>INLINE BitBoard Mask(Square square) { return mask[square]; } >>>INLINE BitBoard RankMask(Rank rank) { return rankMask[rank]; } > >rankmask is (as expected) a mask of all 1's at the relevant rank, so it's >11111111 shifted rank*8 times to the left, if rank is zero-indexed. This can >probably be done faster than a memory lookup too, if it's not put in the hands >of MSVC and Intel, which would probably just do the shift with a function call. >So, again: A faster assembly function should be possible. Please help, assembly >programmers! > >Pretty please? Learning assembler by yourself or waiting for hammer ;-) > <snip> >I'm sure it's fine, but what I would like is x86 assembly functions for these >instead of C++ functions since the compilers I've tried generates lousy code. >I'm asking all you assembly programmers to help me (and others making bitboard >programs on x86), because I'm not much of an assembly programmer myself. See the link above. > >For example: > >>>I don't remember where I got these, I probably stole them or copied them from >>>discussions on CCC. Maybe they can be even faster: >>> INLINE int FirstBit(const BitBoard bitboard) { __asm { bsf eax,[bitboard+4] xor eax, 32 bsf eax,[bitboard] } } INLINE int LastBit(const BitBoard bitboard) { __asm { bsr eax,[bitboard] sub eax,32 bsr eax,[bitboard+4] add eax,32 } } >> <snip> > >I have no idea. I didn't make those functions, I just stole them or borrowed >them. So if they can be optimized, then please do! But it's not only these >assembly language functions that I want you to look at, I want functions for >some or all of the above inline functions that are now in C++. > Simply replace them. >> >> >>> >>>INLINE int PopCount(BitBoard a) // MMX >>>{ >>> static const __int64 C55 = 0x5555555555555555; >>> static const __int64 C33 = 0x3333333333333333; >>> static const __int64 C0F = 0x0F0F0F0F0F0F0F0F; >>> >>> __asm { >>> movd mm0, word ptr a; >>> punpckldq mm0, word ptr a + 4; >>> movq mm1, mm0; >>> psrld mm0, 1; >>> pand mm0, [C55]; >>> psubd mm1, mm0; >>> movq mm0, mm1; >>> psrld mm1, 2; >>> pand mm0, [C33]; >>> pand mm1, [C33]; >>> paddd mm0, mm1; >>> movq mm1, mm0; >>> psrld mm0, 4; >>> paddd mm0, mm1; >>> pand mm0, [C0F]; >>> pxor mm1, mm1; >>> psadbw mm0, mm1; >>> movd eax, mm0; >>> emms; femms for athlon is faster >>// or skip emms at all, if you don't use float >>> } >>>} >> >> >>I found this modified one slightly faster (saves a few bytes): >> >>Regards, >>Gerd >> >>--------------------------------------------------------------- >> >>struct SBitCountConsts >>{ >> BitBoard C55; >> BitBoard C33; >> BitBoard C0F; >> ... >>}; >>extern const SBitCountConsts BitCountConsts; >> >>__forceinline >>int PopCount (BitBoard bb) >>{ >> __asm >> { >> movd mm0, word ptr bb >> punpckldq mm0, word ptr bb + 4 >> lea eax, [BitCountConsts] >> movq mm1, mm0 >> psrld mm0, 1 >> pand mm0, [eax].C55 >> psubd mm1, mm0 >> movq mm0, mm1 >> psrld mm1, 2 >> pand mm0, [eax].C33 >> pand mm1, [eax].C33 >> paddd mm0, mm1 >> movq mm1, mm0 >> psrld mm0, 4 >> paddd mm0, mm1 >> pand mm0, [eax].C0F >> pxor mm1, mm1 >> psadbw mm0, mm1 >> movd eax, mm0 >> } >>} > >OK, I will try it (if I can understand how to use it) > struct SBitCountConsts.. is C. You simply have to initialize the struct somewhere in a c-file with appropriate constants. For the rest simply replace the asm-body. Gerd >Please help me! I suck at assembler! > >/David
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.