Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bitscanning on the Opteron

Author: Russell Reagan

Date: 10:53:08 12/05/03

Go up one level in this thread


On December 05, 2003 at 08:52:10, Anthony Cozzie wrote:

>Three questions:
>
>1. What is the performance of the hardware bitscan, that is:
>
>__asm("bsf %0, %1")

I didn't benchmark it this time, but IIRC, I wrote this test a while back when
Gerd and others here were discussing bitscanning for the Athlon. The bsf
instruction on a P3 is ridiculously fast, like 4 cycles or something, and like
40 on the Athlon. I believe many of the ones that I used in this test were
faster than the bsr/bsf approach.

I'll see if I can find my gcc version of Dann's bsf bitscanning routine and see
how it does.


>2. It would be very interesting to try again with the order of the masks
>changed.  Eugene's version has an easy time of it because of the order (I
>think).

I'm not sure if that's why or not. Eugene said that the compiler should
translate this code into branchless code and that it would execute in like 9 or
12 cycles on the Itanium, which I guess would be similar to the Opteron for this
case.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.