Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bitscan Conclusions

Author: Matt Taylor

Date: 15:54:21 01/05/03

Go up one level in this thread


On January 05, 2003 at 17:52:11, Alessandro Damiani wrote:

>On January 05, 2003 at 17:37:29, Matt Taylor wrote:
>
>>On January 05, 2003 at 17:25:48, Alessandro Damiani wrote:
>>
>>>On January 05, 2003 at 16:19:21, Matt Taylor wrote:
>>>
>>>>There is truly nothing that could waste more time than trying to write -the-
>>>>most optimal routine possible, but this has been much fun for me. I am using a
>>>>tool now to read exact cycle counts for the AMD Athlon (K7) processor.
>>>>
>>>>I have some insight now into the bsf instruction and its implementation on
>>>>different processors. The P5 (Pentium) and K6 processors both use microcode
>>>>loops, I believe. Not completely sure, but the timings would imply it as they
>>>>depend on the value of the operand. The P6 (PPro/Pentium 2/Pentium 3) implements
>>>>bsf/bsr with a priority encoder, which is why they are fast. The K7 (Athlon)
>>>>uses a binary search algorithm in microcode. I would presume the P7 is similar.
>>>>
>>>>Someone had posed the question, "How will Hammer's bsf instruction fare?" The
>>>>answer is that it will be at most 1 cycle slower than the K7's bsf assuming it
>>>>is implemented the same way.
>>>>
>>>>I am finishing up optimizing some of the proposed routines for Athlon. The bsf
>>>>routine takes something like 13-21 cycles depending on which word the bit lies
>>>>in. My favorite routine (binary search) optimized currently runs 15 cycles. I
>>>>think I can get yet another cycle or two back from it. Chances are that it will
>>>>run about as fast as the bsf instruction on Hammer. The difference is that it
>>>>costs almost 128 bytes and 4 registers instead of 3 bytes and 1 register.
>>>>
>>>>Anyone interested in the final results can e-mail me for the optimized routines.
>>>>The link I posted a while ago is no longer valid as I will be switching ISPs
>>>>early next week.
>>>>
>>>>-Matt
>>>
>>>Interesting stuff! I am especially interested in the comparison between Intel
>>>and AMD concerning bsf/bsr. I have got a Pentium 2.
>>>
>>>Alessandro
>>
>>bsf/bsr are 2 cycles on P6-core processors. This includes Pentium Pro, Pentium
>>2, Pentium 3, and old Celerons. The result is that a 64-bit bitscan routine
>>takes ~7-8 cycles whereas it takes ~21 cycles on Athlon (K7). The routine I've
>>been optimizing is now 14 cycles. This means that a 2 GHz Athlon (AthlonXP 2400)
>>will run an Athlon-optimized bitscan routine at the same speed that a 1 GHz
>>Pentium 3 will run the bsf version. The Hammer will do 64-bit natively, and the
>>bsf routine on Hammer will probably take about 10 cycles.
>>
>>-Matt
>
>Thanks for the info!
>
>Since some time I favoured the Athlon to the P4, but now I am disappointed. It
>seems that the next PC I will buy will be a 64 machine....when it will be cheap
>enough. *sigh*
>
>BTW do you know why AMD didn't improve their bsf/bsr to get near Intel's?
>
>Alessandro

I don't think the Pentium 4 has the 2-cycle bsr/bsf instruction. They don't list
it in the timing tables, so I'm not sure what the timing is. Likely they use a
binary search like AMD does because it has uniform search time and performs
relatively well. When they ramp the clock speed, those 8 cycles become cheaper
and cheaper... :-)

AMD did improve bsf/bsr from the K6, just not to 2-cycles. Making bsf/bsr
extremely efficient isn't that useful because most applications can't use the
instructions. That space/complexity is much better spent on things that benefit
most applications which in turn will benefit Chess, too. AMD's decision makes a
good tradeoff unless you want to build an ASIC for Chess.

BTW, since the K8 is really just a 64-bit K7, it should be in the same price
range.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.