Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: My Bit Scan Conclusion

Author: Matt Taylor

Date: 16:32:39 12/18/02

Go up one level in this thread


> Anyway,
>>the fastest overall routines I got on Athlon were the GCC and VC routines with
>>mine fluctuating near the same speed. The pi2fd comes in dead last, and bsf has
>>the best times, but it does not have quite as good average-case time.
>
>Interesting. In my dumb loop test the pi2fd version i posted with bitboard
>ref-parameter is _clearly_ the fastest on my Athlon XP 2.1+. Are your bitboards
>properly aligned?

Ok, after alignment, things change a bit. Thanks for pointing that out. :-)
The pi2fd gets narrowly beaten by my code; all the versions posted here are
slower than pi2fd.

I'll post a comprehensive list of results in a day or two.

>>Walter's routine could be made even faster, I think. The end does a number of
>>numeric operations, but they're interdependant. It seems to me that this part of
>>the algorithm could be further improved. I've been too busy to look further into
>>it, but I think some of the reduction can be done in parallel. If so, my little
>>MMX version of his code could become the most efficient one yet. It's also nice
>>because it doesn't tie up precious GPRs.
>
>Yes, I will also try an mmx-implementation of Walter's LSB_64.
>
>Any documentation available, how fast bsf will be on hammer? In the german
>computer chess magazin c't 26/02 (http://www.heise.de) there was a test with an
>1GHz Athlon 64 prototype (Clawhammer), suggesting an equal
>mmx/3DNow!-performance compared to an 1GHz Athlon 32. Promising results with
>current 32-bit software due to better memory latency and bandwidth. But no
>results so far with "real" 64-bit code.

They didn't have a "real" 64-bit compiler. It's not much fun doing the machine
code, particularly for Athlon64.

I'd imagine that bsf will be about the same as current performance. As I
understand, it's non-deterministic because it's just a microcode loop that looks
for the first bit.

There aren't any real figures because most people aren't concerned with the bsf
instruction. The only timing I know about Athlon64 is that they have a
throughput of 1 mul/cycle (twice as fast as Athlon).

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.