Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: extracting bit from bitboard slow on P4 in C

Author: Vincent Diepeveen

Date: 11:49:40 11/18/02

Go up one level in this thread


On November 18, 2002 at 13:42:32, Robert Hyatt wrote:

The P4 needs in theory 2 cycles each shift. If my math was just 0.001% off
there, then Nalimov would already have posted it i'm sure of that :)

In fact he wants to do without shift instructions that's how matters are
here. But it says something on how important compilers are for crappy
processors!

>On November 18, 2002 at 11:02:48, Vincent Diepeveen wrote:
>
>>On November 18, 2002 at 11:01:28, Robert Hyatt wrote:
>>
>>>On November 18, 2002 at 10:20:49, Vincent Diepeveen wrote:
>>>
>>>>On November 17, 2002 at 13:30:22, Eugene Nalimov wrote:
>>>>
>>>>6 compares   times 0.5 cycles ==>  3 cycles
>>>>5 shiftlefts times 2   cycles ==> 10  cycles
>>>>
>>>>So it is like 13 cycles to get a single bit out of a bitboard in
>>>>this C code is very slow on the P4. 13 cycles is hell of a lot,
>>>>i can generate several moves in non-bitboards within that time!
>>>
>>>Unless something has changed, the P4 is clocked 2x internally.  A shift should
>>>take 1/2 clock cycle if I am reading their manual correctly...
>>
>>No it takes 2 cycles. The shift is hell slower than other operations
>>like ADD.
>
>My intel reference for the PIV says "up to 4 internal clock cycles (four
>micro-ops)" of
>latency.  A shift instruction can execute significantly faster than this
>depending on what
>else is mixed into the micro-op queue."
>
>But, knowing that, the above code can be handled without shifts anyway so it is
>really
>a moot point...
>
>
>>
>>>That turns this into 5.5 cycles which seems more reasonable.  No reason a shiftl
>>>would be that much slower than a compare...  And your counting doesn't make any
>>>sense to me anyway as 6 compares by themselves won't do a thing.  it would
>>>require
>>>jumps or cmov's or something...
>>>
>>>
>>>>
>>>>>On November 17, 2002 at 13:25:55, Vincent Diepeveen wrote:
>>>>>
>>>>>>On November 17, 2002 at 12:01:02, Gerd Isenberg wrote:
>>>>>>
>>>>>>>On November 17, 2002 at 10:53:05, Joel wrote:
>>>>>>>
>>>>>>>>Hey All,
>>>>>>>>
>>>>>>>>I am a 2nd year Uni student from Australia who has recently gotten into chess
>>>>>>>>programming. My first attempt was a simple array-based alpha-beta variant which
>>>>>>>>struggled to search more than 6 levels deep in most positions! I think that
>>>>>>>>might have something to do with the fact that there was no move ordering,
>>>>>>>>transposition table, an expensive evaluation function, no killer moves and weak
>>>>>>>>coding :)
>>>>>>>>
>>>>>>>>I have been working on my second attempt for some time now. It uses Bitboards. I
>>>>>>>>have a few questions regarding move generation.
>>>>>>>>
>>>>>>>>It seems to me that the performance of the Bitboard approach relies somewhat
>>>>>>>>heavily on how fast you can retrieve the position of a 1 bit within a 64-bit
>>>>>>>>unsigned integer. I looked for sometime on the Internet for some kind of
>>>>>>>>magical, hacky solution to this dilemna, and the best I could find was this (b &
>>>>>>>>-b) trick which I used in a debatedly clever way. I was just wondering if there
>>>>>>>>is any approach significantly better than the one which I will outline below:
>>>>>>>>
>>>>>>>>1. (b & -b) to clear all 1 bit's except for one.
>>>>>>>>2. get this value, mod it by 67 (which has the property that every possible
>>>>>>>>   value returned is unique, thus i can hash to the position of the bit in the
>>>>>>>>   64 bit integer.)
>>>>>>>>
>>>>>>>>I am no expert, but it doesn't seem too ineffecient to me. Any problems?
>>>>>>>>
>>>>>>>>Also, if there are any improvements, I would prefer to find out about the ones
>>>>>>>>which do not involve assembly coding - I do not want to make my program too
>>>>>>>>dependant on any one CPU architecture at this stage.
>>>>>>>>
>>>>>>>>Thanks for your time,
>>>>>>>>Joel
>>>>>>>
>>>>>>>Hi Joel,
>>>>>>>
>>>>>>>nice idea with the mod 67 to get unique bitvalues. But i fear a 64 bit div/mod
>>>>>>>operation is too slow, even with 64bit-processors. If you don't want to use
>>>>>>>assembler eg. intels x86 bsf (bit scan foreward), i think using a lookup table
>>>>>>>indexed by the byte or 16-bit word is most common.
>>>>>>>
>>>>>>>I use this approach:
>>>>>>>
>>>>>>>#define LOWBOARD(bb) (*((UINT32*)&(bb)))
>>>>>>>#define HIGHBOARD(bb) (*(((UINT32*)&(bb))+1))
>>>>>>>
>>>>>>>// precondition: bb not null
>>>>>>>__forceinline UINT BitSearch(BitBoard bb)
>>>>>>>{
>>>>>>>        ASSERT(bb != 0);
>>>>>>>#ifdef	_M_IX86
>>>>>>>	__asm
>>>>>>>	{
>>>>>>>		bsf	eax,[bb+4]
>>>>>>>		xor	eax, 32
>>>>>>>		bsf	eax,[bb]
>>>>>>>	}
>>>>>>>#else
>>>>>>>	BitBoard lsbb = bb & (-(__int64)bb);
>>>>>>>	UINT32 lsb = LOWBOARD(lsbb) | HIGHBOARD(lsbb);
>>>>>>>	return ((((((((((HIGHBOARD(lsbb)!=0) <<1)
>>>>>>>			+((lsb & 0xffff0000)!=0))<<1)
>>>>>>>			+((lsb & 0xff00ff00)!=0))<<1)
>>>>>>>			+((lsb & 0xf0f0f0f0)!=0))<<1)
>>>>>>>			+((lsb & 0xcccccccc)!=0))<<1)
>>>>>>>			+((lsb & 0xaaaaaaaa)!=0);
>>>>>>>#endif
>>>>>>>}
>>>>>>>
>>>>>>>Regards,
>>>>>>>Gerd
>>>>>>
>>>>>>Isn't shifting on the P4 *WAY* slower than his mod 67 idea is?
>>>>>
>>>>>No.
>>>>>
>>>>>Thanks,
>>>>>Eugene



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.