Author: Matt Taylor
Date: 22:56:13 02/11/03
Go up one level in this thread
On February 11, 2003 at 14:49:37, Walter Faxon wrote: >Yay, Matt! > >I'm probably not going to get a chance to closely examine your work again soon, >but here's a little doc. problem: "...converts it to an index counting up from >0 to 31..." should of course be 0 to 63. Good point. Fixed. I'll update the copy on the FTP. Once again, address is: ftp://24.73.126.6/lsb.c ftp://24.73.126.6/lsb.h >Also, for the C routine after: > > #ifdef __LSB_USE_C_INLINE > #define __LSB_USE_TABLE > static _INLINE _I32_TYPE BitSearchResetInternal(_LSB_I64 *bb) > >I don't see where you're resetting the bit. Oops! I tested the 3 inline assembly versions, but I forgot that I'd even included an inline C version. At your suggestion I've already reimplemented it using the table-based version. (It was supposed to be that to begin with as you will notice it defines __LSB_USE_TABLE.) The reimplementation correctly resets the bit. >Once you're satisfied with everything you should encourage chess programmers to >experiment with your routines and give you some feedback. For example, IIRC, >Uri said that the table version was definitely faster than the bsf version when >run in test mode, but slower when run inside his program. He attributed this to >his using a lot of loops with the former's larger code size greatly increasing >his program's cache footprint. (Maybe he somehow duplicated the table for each >loop or routine?? That would lead to cache misses.) One advantage to bsf is that it can be more readily inlined. I quickly poured through the 3 GCC versions of the routine and came up with the following sizes (approximations): fast bsf - 27 bytes slow bsf - 31 bytes table - 50 bytes The table-based routine is a good 4-5 cycles faster than the "slow bsf" on Athlon. However, if the compiler decides to generate a function call, that speed will be completely negated with something like 8-10 cycles function call overhead. There is a general rule for deciding when to inline (something like less than x instructions or x bytes), but I can't remember what it is now. I believe I read it in the K7 optimization manual. Alternatively, a trade-off can be made. If you find a few particular hotspots, you can use an inline table version for the hotspots and inline slow bsf (or fast bsf on P6) in other places. Maybe in places that are infrequently hit (if you are that lucky), don't use inline assembly at all. Maybe. Tweaking the program to that degree is annoying, and one can only hope that profiling tools will make these trade-offs for you. >Also, you might consider promulgating your results to other sites on >bit-fiddling tricks, e.g., Magic algorithms from The Aggregate >(http://aggregate.org/MAGIC/). They probably want pure C versions. Probably, and I'm not being particularly clever at the C-level; I only used suggestions you gave. Their tzc (Trailing Zero Count) is not particularly efficient, but it isn't too bad either. The main problem is that they subtract one and popcount. I did examine this technique when working on optimizing the bitscan routines, but it was rather inefficient because the popcount the popcount is inefficient (for this problem). The popcount can't make assumptions about relative bit locations -- assumptions we -can- make since we know there are i contiguous, right-aligned 1's in (x & -x) - 1, where i is the index of the least-signficiant bit in x. Your table solution is good, and it also seems to be compiler-friendly. Until I made a few reductions, the code that VC 7 generated was faster than my own. The binary search is also very good. >Anyway, thanks again! I'm sure your work will help a lot of people. Well, a >few dozen, anyway! > >-- Walter I hope it was worth more than an exercise. :-) I'm thinking of adding popcount straight from AMD's manual. I may in the future add optimized versions of bitscan for other chips, and I may add other versions (i.e. with no side-effects, zero bitboard allowed, etc.) If I manage to complete anything else, I'll post about it here. -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.