Author: Bas Hamstra
Date: 18:35:03 02/10/04
Go up one level in this thread
On February 10, 2004 at 11:28:48, Vincent Diepeveen wrote: >On February 10, 2004 at 10:57:15, Bas Hamstra wrote: > >>On February 09, 2004 at 09:07:41, Gerd Isenberg wrote: >> >>>On February 09, 2004 at 05:09:07, Renze Steenhuisen wrote: >>> >>>> >>>>Hi there! >>>> >>>>does anyone know the fastest way to determine the lcoation of the Least/Most >>>>Significant Bit, and to clear that bit? >>>> >>>>In my case the most important platforms (at the moment) are Pentium III and IV. >>>> >>>>Thanks! >>> >>>find lsb from CCC archives: >>> >>>http://chessprogramming.org/cccsearch/ccc.php?art_id=265616 >>> >>>Walter Faxon's magic BitScan: >>>http://chessprogramming.org/cccsearch/ccc.php?art_id=272378 >>> >>>Matt Taylor's De Bruijn folding trick: >>>http://chessprogramming.org/cccsearch/ccc.php?art_id=305965 >>> >>>Gerd >>> >>> >>>> >>>>Renze >> >>Gerd, >> >>IMO the BSF instruction is still fastest. Don't test this in loops, in a >>searching engine you get quite different results and BSF is in my case clearly >>the winner, especially for the paltforms he mentiones, and besides he is not >>talking 64 bit... >> >> >>Bas. > >It depends upon which CPU you use Bas. >In diep i'm using a simple table lookup for several years already; later i >found out that Nalimov had posted similar code here for 64 bits bitboards. > >But well i'm not 64 bits bitboards so in 32 bits it's a lot easier :) > >The reasons why i'm not using BSF has however a different cause than the reason >why Nalimov posted it without BSF here. I simply want to avoid inline assembly >wherever possible. > >For me that is a convincing argument favouring Eugene's code :) Hi Vincent, I am not sure you are right here. I tested on some Pentia and on my Athlon XP 1600+, the latter having the worst possible BSF. But even there BSF is clearly superior to any C code I know of. Table lookup is ok, I use a 256 char table lookup for PopCnt(), but it IS noticably slower than BSF for LastBit, even on the Athlon. For a BitBoard program that is probably more critical than for Diep. Cache is a strange thing. In a loop I had some software versions run faster than BSF (factor 3 even). But not embedded in the running engine. So I thought hey, cache is very important, I'll compact my rotated bitboard lookup tables to under 16k, that will speed me up. But no, it slowed me down. And the *only* thing I needed extra compared to normal, was a table-lookup in a tiny table. Go figure... Most efficient so far, seems to be the standard approach with 4x compressed lookup tables (64 bit "state" in stead of 256 bit). Best regards, Bas.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.