Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: My Bit Scan Conclusion

Author: Matt Taylor

Date: 20:39:15 12/18/02

Go up one level in this thread


On December 18, 2002 at 19:38:33, Walter Faxon wrote:

>On December 18, 2002 at 19:32:39, Matt Taylor wrote:
>
>>> Anyway,
>>>>the fastest overall routines I got on Athlon were the GCC and VC routines with
>>>>mine fluctuating near the same speed. The pi2fd comes in dead last, and bsf has
>>>>the best times, but it does not have quite as good average-case time.
>>>
>>>Interesting. In my dumb loop test the pi2fd version i posted with bitboard
>>>ref-parameter is _clearly_ the fastest on my Athlon XP 2.1+. Are your bitboards
>>>properly aligned?
>>
>>Ok, after alignment, things change a bit. Thanks for pointing that out. :-)
>>The pi2fd gets narrowly beaten by my code; all the versions posted here are
>>slower than pi2fd.
>>
>>I'll post a comprehensive list of results in a day or two.
>>
>>>>Walter's routine could be made even faster, I think. The end does a number of
>>>>numeric operations, but they're interdependant. It seems to me that this part of
>>>>the algorithm could be further improved. I've been too busy to look further into
>>>>it, but I think some of the reduction can be done in parallel. If so, my little
>>>>MMX version of his code could become the most efficient one yet. It's also nice
>>>>because it doesn't tie up precious GPRs.
>>>
>>>Yes, I will also try an mmx-implementation of Walter's LSB_64.
>>>
>>>Any documentation available, how fast bsf will be on hammer? In the german
>>>computer chess magazin c't 26/02 (http://www.heise.de) there was a test with an
>>>1GHz Athlon 64 prototype (Clawhammer), suggesting an equal
>>>mmx/3DNow!-performance compared to an 1GHz Athlon 32. Promising results with
>>>current 32-bit software due to better memory latency and bandwidth. But no
>>>results so far with "real" 64-bit code.
>>
>>They didn't have a "real" 64-bit compiler. It's not much fun doing the machine
>>code, particularly for Athlon64.
>>
>>I'd imagine that bsf will be about the same as current performance. As I
>>understand, it's non-deterministic because it's just a microcode loop that looks
>>for the first bit.
>>
>>There aren't any real figures because most people aren't concerned with the bsf
>>instruction. The only timing I know about Athlon64 is that they have a
>>throughput of 1 mul/cycle (twice as fast as Athlon).
>>
>>-Matt
>
>
>Well, we've had a number of solutions suggested here, and maybe one of them
>might be better than that microcode loop.  Maybe someone ought to clue in AMD?
>
>By the way, how does something being a microcode loop make it non-deterministic?
>
>-- Walter

It doesn't. It's because the microcode loop is probably an early-out loop like
the 386 multiply unit. What I was getting at is that the instruction's execution
time depends on the data you give it. The other solutions posted here all work
in constant time neglecting all the bias factors (cache miss, interrupts, etc.).

Here's how Intel describes the BSF instruction in pseudocode:

IF SRC = 0
 THEN
     ZF = 1;
     DEST is undefined;
 ELSE
     ZF = 0;
     temp = 0;

     WHILE Bit(SRC, temp) = 0
     DO
         temp = temp + 1;
         DEST = temp;
     OD;
FI;

I would love to talk to AMD, but I'm afraid they wouldn't listen much to me. The
first question I would like to ask them is why my processors have an identity
crisis. I suspect the BIOS computes the frequency and programs the chip. It
would be rather amusing to make my chips call themselves something different.
:-)

I did a bit more fiddling today, and it seems that some of my past timing data
has been in error. I had some branches mispredicting because I allowed the
compiler to generate those branches, and it chose wrong. I've rewritten a lot of
the code so that I can test some 114 different patterns (0, -1, 1-bit patterns,
and a series of 96 2-bit patterns). I won't post all the data here because it'll
be too long; rather, I'll post a link, and probably within the next hour or two.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.