Author: Tim Foden
Date: 15:49:11 06/21/02
Go up one level in this thread
On June 21, 2002 at 13:47:04, Gerd Isenberg wrote: >On June 20, 2002 at 17:10:46, Tim Foden wrote: > >Thanks Tim for sharing your code. I tried it and it's definitely faster (at >least on AMD1,4GHZ). I only have first impressions, and have to do some >profiling. In one testpositions so far a have a solution time of 102 sec instead >of 115 sec (may be influenced by some other chaotic optimizing or whatever >effects). Even if the number of used registers is larger in your inline routine >and therefore it may more difficult to hold vars of the calling routine in >registers, it's faster - very interesting. > >It seems that this vector path instructions (bsf, btr) are very inefficient on >Athlons. It's nice to know I'm not going mad :) I keep trying different versions of this algorithm, but I always find that the one with branches is fastest! BTW, here is the fastest one I have that doesn't have branches. It is only a little slower. It is only faster in cases where all the pieces are on the opposite side of the board than normal. It still uses 4 registers too, I'm afraid. Also, this one doesn't cope with empty bitmaps. inline Square BitScanForwardRemove( UINT64& bitmap ) { ASSERT( bitmap ); __asm { mov ebx, [bitmap] mov eax, 0 cmp dword ptr [ebx], eax mov edx, 1 setz al bsf ecx, [ebx + eax * 4] lea ebx, [ebx + eax * 4] shl eax, 5 shl edx, cl add eax, ecx xor [ebx], edx } // value returned in eax } Cheers, Tim.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.