Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Nalimov: bsf/bsr intrinsics implementation still not optimal

Author: Gerd Isenberg

Date: 06:37:46 09/23/04

Go up one level in this thread


On September 23, 2004 at 08:31:33, Dezhi Zhao wrote:

>On September 23, 2004 at 06:12:48, Gerd Isenberg wrote:
>
>>On September 22, 2004 at 13:55:46, Dezhi Zhao wrote:
>>
>>>I'm  really happy that bit operation instructions have become intrinsics for VC
>>>compiler in VS 2003 and later.
>>>
>>>However the output asm code is still not optimal. It generates a pair of
>>>redundant memory-register save and load instrucions. I also tested VC 2005
>>>Express beta1. The same thing again....
>>
>>Yes, such not neccesary store/loads seem a problem with
>>msc-intrinsics:
>>
>>http://chessprogramming.org/cccsearch/ccc.php?art_id=357561
>>http://chessprogramming.org/cccsearch/ccc.php?art_id=357568
>>http://chessprogramming.org/cccsearch/ccc.php?art_id=357570
>>
>>But in your case, based on the prototype of the instrinsic, i guess a boolean
>>return, but a pointer to index, the bsf-intrinsic is foreced to store the found
>>bitindex to memory.
>>
>>Gerd
>>
>
>You're probally right that the prototype fools the compiler. However I think the
>complier still has good oppotunities to eliminate those store/load instructions
>when it comes to optimize the code.

I think that too, but that is probably not so easy, considering alias pointer to
the same address.

>Now look at the prototype again:
>
>unsigned char _BitScanForward(unsigned long* index, unsigned long mask);
>
>It returns a boolean value. This is to handle the case where mask == 0.
>I don't think BitScanForward needs such exception handler. I like this better:
>
>long _BitScanForward(long mask);
>
>But you need to add a warning in the documentation that the return value is
>undefined if mask == 0 just as the one you can find in bsf/bsr documentation.

Yes, i like the none pointer prototype much better too.

But leaving an undefined return value for a C-instrinsic is probably too
dangerous - even if it is most often used in this while(mask) or if(mask)
context. I suggest returning an negative (or > 31,63) value for mask == 0.

An additional jnc after bsf/bsr is not so expensive, if always not taken,
specially compared to the huge latency of bsf/bsr. But it leaves the option to
build the traverse loop without an explicit != zero test.

>
>I'm sure if it were of the simplier proptype, the current compiler would not
>generate the redundant store/load instructions.
>
>Zhao

Yes.

Gerd






This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.