Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: extracting bit from bitboard slow on P4 in C

Author: Robert Hyatt

Date: 10:42:32 11/18/02

Go up one level in this thread


On November 18, 2002 at 11:02:48, Vincent Diepeveen wrote:

>On November 18, 2002 at 11:01:28, Robert Hyatt wrote:
>
>>On November 18, 2002 at 10:20:49, Vincent Diepeveen wrote:
>>
>>>On November 17, 2002 at 13:30:22, Eugene Nalimov wrote:
>>>
>>>6 compares   times 0.5 cycles ==>  3 cycles
>>>5 shiftlefts times 2   cycles ==> 10  cycles
>>>
>>>So it is like 13 cycles to get a single bit out of a bitboard in
>>>this C code is very slow on the P4. 13 cycles is hell of a lot,
>>>i can generate several moves in non-bitboards within that time!
>>
>>Unless something has changed, the P4 is clocked 2x internally.  A shift should
>>take 1/2 clock cycle if I am reading their manual correctly...
>
>No it takes 2 cycles. The shift is hell slower than other operations
>like ADD.

My intel reference for the PIV says "up to 4 internal clock cycles (four
micro-ops)" of
latency.  A shift instruction can execute significantly faster than this
depending on what
else is mixed into the micro-op queue."

But, knowing that, the above code can be handled without shifts anyway so it is
really
a moot point...


>
>>That turns this into 5.5 cycles which seems more reasonable.  No reason a shiftl
>>would be that much slower than a compare...  And your counting doesn't make any
>>sense to me anyway as 6 compares by themselves won't do a thing.  it would
>>require
>>jumps or cmov's or something...
>>
>>
>>>
>>>>On November 17, 2002 at 13:25:55, Vincent Diepeveen wrote:
>>>>
>>>>>On November 17, 2002 at 12:01:02, Gerd Isenberg wrote:
>>>>>
>>>>>>On November 17, 2002 at 10:53:05, Joel wrote:
>>>>>>
>>>>>>>Hey All,
>>>>>>>
>>>>>>>I am a 2nd year Uni student from Australia who has recently gotten into chess
>>>>>>>programming. My first attempt was a simple array-based alpha-beta variant which
>>>>>>>struggled to search more than 6 levels deep in most positions! I think that
>>>>>>>might have something to do with the fact that there was no move ordering,
>>>>>>>transposition table, an expensive evaluation function, no killer moves and weak
>>>>>>>coding :)
>>>>>>>
>>>>>>>I have been working on my second attempt for some time now. It uses Bitboards. I
>>>>>>>have a few questions regarding move generation.
>>>>>>>
>>>>>>>It seems to me that the performance of the Bitboard approach relies somewhat
>>>>>>>heavily on how fast you can retrieve the position of a 1 bit within a 64-bit
>>>>>>>unsigned integer. I looked for sometime on the Internet for some kind of
>>>>>>>magical, hacky solution to this dilemna, and the best I could find was this (b &
>>>>>>>-b) trick which I used in a debatedly clever way. I was just wondering if there
>>>>>>>is any approach significantly better than the one which I will outline below:
>>>>>>>
>>>>>>>1. (b & -b) to clear all 1 bit's except for one.
>>>>>>>2. get this value, mod it by 67 (which has the property that every possible
>>>>>>>   value returned is unique, thus i can hash to the position of the bit in the
>>>>>>>   64 bit integer.)
>>>>>>>
>>>>>>>I am no expert, but it doesn't seem too ineffecient to me. Any problems?
>>>>>>>
>>>>>>>Also, if there are any improvements, I would prefer to find out about the ones
>>>>>>>which do not involve assembly coding - I do not want to make my program too
>>>>>>>dependant on any one CPU architecture at this stage.
>>>>>>>
>>>>>>>Thanks for your time,
>>>>>>>Joel
>>>>>>
>>>>>>Hi Joel,
>>>>>>
>>>>>>nice idea with the mod 67 to get unique bitvalues. But i fear a 64 bit div/mod
>>>>>>operation is too slow, even with 64bit-processors. If you don't want to use
>>>>>>assembler eg. intels x86 bsf (bit scan foreward), i think using a lookup table
>>>>>>indexed by the byte or 16-bit word is most common.
>>>>>>
>>>>>>I use this approach:
>>>>>>
>>>>>>#define LOWBOARD(bb) (*((UINT32*)&(bb)))
>>>>>>#define HIGHBOARD(bb) (*(((UINT32*)&(bb))+1))
>>>>>>
>>>>>>// precondition: bb not null
>>>>>>__forceinline UINT BitSearch(BitBoard bb)
>>>>>>{
>>>>>>        ASSERT(bb != 0);
>>>>>>#ifdef	_M_IX86
>>>>>>	__asm
>>>>>>	{
>>>>>>		bsf	eax,[bb+4]
>>>>>>		xor	eax, 32
>>>>>>		bsf	eax,[bb]
>>>>>>	}
>>>>>>#else
>>>>>>	BitBoard lsbb = bb & (-(__int64)bb);
>>>>>>	UINT32 lsb = LOWBOARD(lsbb) | HIGHBOARD(lsbb);
>>>>>>	return ((((((((((HIGHBOARD(lsbb)!=0) <<1)
>>>>>>			+((lsb & 0xffff0000)!=0))<<1)
>>>>>>			+((lsb & 0xff00ff00)!=0))<<1)
>>>>>>			+((lsb & 0xf0f0f0f0)!=0))<<1)
>>>>>>			+((lsb & 0xcccccccc)!=0))<<1)
>>>>>>			+((lsb & 0xaaaaaaaa)!=0);
>>>>>>#endif
>>>>>>}
>>>>>>
>>>>>>Regards,
>>>>>>Gerd
>>>>>
>>>>>Isn't shifting on the P4 *WAY* slower than his mod 67 idea is?
>>>>
>>>>No.
>>>>
>>>>Thanks,
>>>>Eugene



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.