Author: Gerd Isenberg
Date: 07:09:31 12/03/02
Go up one level in this thread
On December 03, 2002 at 06:40:36, Gian-Carlo Pascutto wrote: >On December 02, 2002 at 19:43:15, Gerd Isenberg wrote: > >>Congrats, Walter!! >> >>10-bit pattern bsf PI2FD btr c LSB_64 >>0x0000000011111133 15.3 18.0 19.1 22.8 17.8 >>0x1010111010101110 19.7 18.5 19.6 23.4 17.8 >>0x1111113300000000 20.6 18.0 19.1 22.8 17.8 > >Does this mean that this is the fastest currently known >FindFirstAndRemove *and* that it doesn't need assembly? Hi Gian-Carlo, Yes, it seems so. Walter's "magic" routine has only cheap instructions and one lookup with a cache friendly table of 154 Bytes. Even if the conditional bsf-approach is obviously faster, if we have one-bit's in the lower half, i hate this kind of asymmetry. Let's see how fast Hammer's bsf reg64,reg64 will be. If it sucks like Athlon, there are a lot of good alternatives. I guess that with 64-bit arithmetic, to do a fast single bit isolation the trick with 64int to 32bit float conversion is promising. What i like on Walter's approach is the kind of folding down the sequence of consecutive trailing one's (SingleBit - 1) from one 64-bit word to one 32 bit word without loosing any information. eg 0000..011|1111...1111 1111...1100 Then this wonderful magic xor, some parallel prefix shifts with add and sub, to get a unique value in this 0..153 range. Great respect to Walter, to invent these algorthm. Looking for something similar, with two bitboards (fromBB, toBB of valid moves) to get unique value in a range of let say 0 to 64*28-1 or 2047! > >Impressive for sure! > >Any similar tricks for lightning-speed popcounting? I fear with random bits it's not so easy, but who knows... > >What's the fastest sequence you've got for popcounting so >far? (preferably one that doesn't depend too much on new >instructions) I use the Athlon MMX-routine provided by AMD's Optimization Guide based on this c-routine: UINT32 BitCount (BitBoard bb) { static const UINT32 m1 = 0x55555555; static const UINT32 m2 = 0x33333333; static const UINT32 m3 = 0x0f0f0f0f; UINT32 l = (UINT32) bb; UINT32 h = (UINT32)(bb>>32); l = l - ((l>>1) & m1); h = h - ((h>>1) & m1); l = (l & m2) + ((l>>2) & m2); h = (h & m2) + ((h>>2) & m2); l = (l & m3) + ((l>>4) & m3); h = (h & m3) + ((h>>4) & m3); l = (l & 0x0ffff) + (l>>16); h = (h & 0x0ffff) + (h>>16); return ((l & 0x0ff) + (l>>8) + (h & 0x0ff) + (h>>8)); } For rarely populated bitboards (< 8), i use a count loop with bb &= bb-1. In this context i use also boolean inlnes like BOOL IsBitCountGreaterOne(BitBoard bb); for statements like this if ( BitCount(bb) > 1 ) I prefere methods without table-lookups, even if they are a bit faster in this kind of dumb loop measurement. But on the other hand these excessive computations, keeping most pipes busy, are not so hyperthreading friendly. Cheers, Gerd >-- >GCP
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.