Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: My Bit Scan Conclusion

Author: Walter Faxon

Date: 16:38:33 12/18/02

Go up one level in this thread


On December 18, 2002 at 19:32:39, Matt Taylor wrote:

>> Anyway,
>>>the fastest overall routines I got on Athlon were the GCC and VC routines with
>>>mine fluctuating near the same speed. The pi2fd comes in dead last, and bsf has
>>>the best times, but it does not have quite as good average-case time.
>>
>>Interesting. In my dumb loop test the pi2fd version i posted with bitboard
>>ref-parameter is _clearly_ the fastest on my Athlon XP 2.1+. Are your bitboards
>>properly aligned?
>
>Ok, after alignment, things change a bit. Thanks for pointing that out. :-)
>The pi2fd gets narrowly beaten by my code; all the versions posted here are
>slower than pi2fd.
>
>I'll post a comprehensive list of results in a day or two.
>
>>>Walter's routine could be made even faster, I think. The end does a number of
>>>numeric operations, but they're interdependant. It seems to me that this part of
>>>the algorithm could be further improved. I've been too busy to look further into
>>>it, but I think some of the reduction can be done in parallel. If so, my little
>>>MMX version of his code could become the most efficient one yet. It's also nice
>>>because it doesn't tie up precious GPRs.
>>
>>Yes, I will also try an mmx-implementation of Walter's LSB_64.
>>
>>Any documentation available, how fast bsf will be on hammer? In the german
>>computer chess magazin c't 26/02 (http://www.heise.de) there was a test with an
>>1GHz Athlon 64 prototype (Clawhammer), suggesting an equal
>>mmx/3DNow!-performance compared to an 1GHz Athlon 32. Promising results with
>>current 32-bit software due to better memory latency and bandwidth. But no
>>results so far with "real" 64-bit code.
>
>They didn't have a "real" 64-bit compiler. It's not much fun doing the machine
>code, particularly for Athlon64.
>
>I'd imagine that bsf will be about the same as current performance. As I
>understand, it's non-deterministic because it's just a microcode loop that looks
>for the first bit.
>
>There aren't any real figures because most people aren't concerned with the bsf
>instruction. The only timing I know about Athlon64 is that they have a
>throughput of 1 mul/cycle (twice as fast as Athlon).
>
>-Matt


Well, we've had a number of solutions suggested here, and maybe one of them
might be better than that microcode loop.  Maybe someone ought to clue in AMD?

By the way, how does something being a microcode loop make it non-deterministic?

-- Walter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.