Author: Gerd Isenberg
Date: 06:37:46 09/23/04
Go up one level in this thread
On September 23, 2004 at 08:31:33, Dezhi Zhao wrote: >On September 23, 2004 at 06:12:48, Gerd Isenberg wrote: > >>On September 22, 2004 at 13:55:46, Dezhi Zhao wrote: >> >>>I'm really happy that bit operation instructions have become intrinsics for VC >>>compiler in VS 2003 and later. >>> >>>However the output asm code is still not optimal. It generates a pair of >>>redundant memory-register save and load instrucions. I also tested VC 2005 >>>Express beta1. The same thing again.... >> >>Yes, such not neccesary store/loads seem a problem with >>msc-intrinsics: >> >>http://chessprogramming.org/cccsearch/ccc.php?art_id=357561 >>http://chessprogramming.org/cccsearch/ccc.php?art_id=357568 >>http://chessprogramming.org/cccsearch/ccc.php?art_id=357570 >> >>But in your case, based on the prototype of the instrinsic, i guess a boolean >>return, but a pointer to index, the bsf-intrinsic is foreced to store the found >>bitindex to memory. >> >>Gerd >> > >You're probally right that the prototype fools the compiler. However I think the >complier still has good oppotunities to eliminate those store/load instructions >when it comes to optimize the code. I think that too, but that is probably not so easy, considering alias pointer to the same address. >Now look at the prototype again: > >unsigned char _BitScanForward(unsigned long* index, unsigned long mask); > >It returns a boolean value. This is to handle the case where mask == 0. >I don't think BitScanForward needs such exception handler. I like this better: > >long _BitScanForward(long mask); > >But you need to add a warning in the documentation that the return value is >undefined if mask == 0 just as the one you can find in bsf/bsr documentation. Yes, i like the none pointer prototype much better too. But leaving an undefined return value for a C-instrinsic is probably too dangerous - even if it is most often used in this while(mask) or if(mask) context. I suggest returning an negative (or > 31,63) value for mask == 0. An additional jnc after bsf/bsr is not so expensive, if always not taken, specially compared to the huge latency of bsf/bsr. But it leaves the option to build the traverse loop without an explicit != zero test. > >I'm sure if it were of the simplier proptype, the current compiler would not >generate the redundant store/load instructions. > >Zhao Yes. Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.