Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Bitscan Updated

Author: Dezhi Zhao

Date: 19:05:38 02/16/03

On February 16, 2003 at 20:08:00, Matt Taylor wrote:

>On February 16, 2003 at 15:55:37, Dezhi Zhao wrote:
>
>>On February 16, 2003 at 07:37:31, Gerd Isenberg wrote:
>>
>>>On February 16, 2003 at 05:14:41, Matt Taylor wrote:
>>>
>>>>On February 14, 2003 at 23:52:09, Matt Taylor wrote:
>>>>
>>>>>On February 14, 2003 at 14:31:42, Gerd Isenberg wrote:
>>>>>
>>>>>>I tried two of your routines so far, the cmov/single-bsf one and the
>>>>>>LSB_32_table lookup one. It's a mess, both without any branches, but IsiChess is
>>>>>>stil 2-4% slower with these. I guess some chaotic code size dependent side
>>>>>>effects (Athlon Xp 2000+, 512MB SDRAM).
>>>>>>
>>>>>>Regards,
>>>>>>Gerd
>>>>
>>>>What -did- you use before? After thinking about it, I don't see how these
>>>>routines could make such an impact. It would be very difficult to shrink them
>>>>further.
>>>>
>>>>Have you tried not inlining them? That would make the resulting code smaller.
>>>>
>>>>-Matt
>>>
>>>Hi Matt,
>>>
>>>this one (posted already x times) is the routine i compared with:
>>>
>>>__forceinline
>>>UINT BitSearchAndReset(BitBoard &bb)
>>>{
>>>	__asm
>>>	{
>>>		mov		ebx, [bb]
>>>		xor		eax, eax
>>>		mov		edx, 1
>>>		bsf		ecx, [ebx]
>>>		jnz		found
>>>		bsf		ecx, [ebx + 4]
>>>		lea		ebx, [ebx + 4]
>>>		xor		eax, 32
>>>	found:
>>>		shl		edx, cl
>>>		xor		eax, ecx
>>>		xor		[ebx], edx
>>>	}
>>>
>>>}
>>>
>>>To my surprise this conditional routine using four registers is faster than this
>>>short unconditional one but using three vector path instructions:
>>>
>>>__forceinline
>>>UINT BitSearchAndReset(BitBoard &bb)
>>>{
>>>	__asm
>>>	{
>>>		mov		edx, [bb]
>>>		bsf		eax, [edx+4]
>>>		xor		eax, 32
>>>		bsf		eax, [edx]
>>>		btr		[edx],eax
>>>	}
>>>}
>>>
>>>With none inlining, your routines and even walters magic lsb64 lookup routine
>>>are all about the same in average, but the nps and solution time performance is
>>>significantly worse (4-10%).
>>>
>>>Regards,
>>>Gerd
>>
>>This sounds quite nomral to me. If you choose to inline them, you shorter
>>version will be faster.  Matt's functions are optimal in context of functions.
>>If you choose to inline them, you have to factor in register usages and out of
>>order execuation. Your shoter version uses fewer registers and the processor can
>>often find OOE oppotunities to hide the 2nd bsf instruction.
>
>First let me say that we are specifically talking about Athlon.
>
>On Athlon, VectorPath instructions (like bsf) decode into a series of macro-ops.
>OOOE cannot hide the 8 cycles of latency this produces because the microcoded
>bsf instruction consumes most (all?) of the execution resources. At times I have
>noted the bsf instruction taking only 7 cycles. Perhaps this is an effect of
>OOOE.
>
I do not have an Athlon. However I assume it implement bsf, bsr, btr the same as
P4. Yes, it is dfficult to hide an instruction of 8 cycle latency. You can
manage to hide part of it sometimes, if you have another slow instruction in the
stream, e.g. a shift operation.

>Register usage is a non-issue because VC assumes that all general registers are
>trashed over inline assembly. I can use 1 register of 5 registers; it makes no
>difference. Besides -- he and I both use 4 registers.
>

If you watch the VC-generated code with inlined asm, you will find register
spill-over. The more register the inline asm uses, the more spill-over. VC does
not have a way to name a variable register.

>>Of course, if you go hand inlining Mat's code into your code (rewrite all the
>>callers in asm), you will see the speedup. And you find even more pairing
>>oppotunites:)
>
>Isn't that what the compiler is already doing?

No. I mean the compiler does a bad job when it comes over inline asm. You can
try it, I bet you wont like the code it generates. It generates some unnecessary
memory operations to free registers.

>
>>Have fun
>>
>>dzhao
>
>-Matt

Re: Bitscan Updated Matt Taylor 00:58:13 02/17/03
- Re: Bitscan Updated Dezhi Zhao 10:34:11 02/17/03
  - Re: Bitscan Updated Matt Taylor 11:44:44 02/17/03
    - Re: Bitscan Updated Dezhi Zhao 12:19:12 02/17/03

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.