Author: Matt Taylor
Date: 16:32:39 12/18/02
Go up one level in this thread
> Anyway, >>the fastest overall routines I got on Athlon were the GCC and VC routines with >>mine fluctuating near the same speed. The pi2fd comes in dead last, and bsf has >>the best times, but it does not have quite as good average-case time. > >Interesting. In my dumb loop test the pi2fd version i posted with bitboard >ref-parameter is _clearly_ the fastest on my Athlon XP 2.1+. Are your bitboards >properly aligned? Ok, after alignment, things change a bit. Thanks for pointing that out. :-) The pi2fd gets narrowly beaten by my code; all the versions posted here are slower than pi2fd. I'll post a comprehensive list of results in a day or two. >>Walter's routine could be made even faster, I think. The end does a number of >>numeric operations, but they're interdependant. It seems to me that this part of >>the algorithm could be further improved. I've been too busy to look further into >>it, but I think some of the reduction can be done in parallel. If so, my little >>MMX version of his code could become the most efficient one yet. It's also nice >>because it doesn't tie up precious GPRs. > >Yes, I will also try an mmx-implementation of Walter's LSB_64. > >Any documentation available, how fast bsf will be on hammer? In the german >computer chess magazin c't 26/02 (http://www.heise.de) there was a test with an >1GHz Athlon 64 prototype (Clawhammer), suggesting an equal >mmx/3DNow!-performance compared to an 1GHz Athlon 32. Promising results with >current 32-bit software due to better memory latency and bandwidth. But no >results so far with "real" 64-bit code. They didn't have a "real" 64-bit compiler. It's not much fun doing the machine code, particularly for Athlon64. I'd imagine that bsf will be about the same as current performance. As I understand, it's non-deterministic because it's just a microcode loop that looks for the first bit. There aren't any real figures because most people aren't concerned with the bsf instruction. The only timing I know about Athlon64 is that they have a throughput of 1 mul/cycle (twice as fast as Athlon). -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.