Author: Russell Reagan
Date: 10:53:08 12/05/03
Go up one level in this thread
On December 05, 2003 at 08:52:10, Anthony Cozzie wrote:
>Three questions:
>
>1. What is the performance of the hardware bitscan, that is:
>
>__asm("bsf %0, %1")
I didn't benchmark it this time, but IIRC, I wrote this test a while back when
Gerd and others here were discussing bitscanning for the Athlon. The bsf
instruction on a P3 is ridiculously fast, like 4 cycles or something, and like
40 on the Athlon. I believe many of the ones that I used in this test were
faster than the bsr/bsf approach.
I'll see if I can find my gcc version of Dann's bsf bitscanning routine and see
how it does.
>2. It would be very interesting to try again with the order of the masks
>changed. Eugene's version has an easy time of it because of the order (I
>think).
I'm not sure if that's why or not. Eugene said that the compiler should
translate this code into branchless code and that it would execute in like 9 or
12 cycles on the Itanium, which I guess would be similar to the Opteron for this
case.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.