Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Mixing 0x88 and bitboards

Author: Filip Tvrzsky
Date: 18:46:27 06/27/02
On June 27, 2002 at 11:49:39, Sune Fischer wrote:

>On June 27, 2002 at 11:42:35, Robert Hyatt wrote:
>
>>On June 26, 2002 at 04:38:25, Sune Fischer wrote:
>>
>>>On June 26, 2002 at 02:02:28, Bruce Moreland wrote:
>>>
>>>>On June 25, 2002 at 05:26:34, Sune Fischer wrote:
>>>>
>>>>>On June 24, 2002 at 20:33:32, Russell Reagan wrote:
>>>>>
>>>>>>I am not an experienced bitboard user. I've been thinking about trying them out
>>>>>>just to see how they work in comparison with my current 0x88 implementation. Is
>>>>>>there any potential problem in mixing the two approaches? One problem that I can
>>>>>>see would be that if you have an index for a bitboard, it's not going to index
>>>>>>into the same square in your 0x88 array. I suppose the only real reason to keep
>>>>>>0x88 would be for efficient edge detection. Do bitboards offer a solution to
>>>>>>edge detection? Or do they even need edge detection when using bitboards (ex.
>>>>>>using the BSF asm instuction would seem to avoid edge detection altogheter)?
>>>>>>
>>>>>>Thanks,
>>>>>>Russell
>>>>>
>>>>>For edge detection you can do this:
>>>>>
>>>>>if (((uint64)1<<square)&0xFF818181818181FF)
>>>>>  ...
>>>>>
>>>>>it will tell you when you are on the first and last file or rank.
>>>>>I use that in my SEE to raytrace behind the attacking piece, something like
>>>>>
>>>>>while (!(((uint64)1<<square)&0xFF818181818181FF)) {
>>>>>  square+=direction;
>>>>>  if (there-is-an-attacker-on-square)
>>>>>    add-it-to-the-list-of-attackers
>>>>>}
>>>>>
>>>>
>>>>That shift has to be lots of fun for the processor.
>>>
>>>That is why I really use a Mask[square] table, but for the sake of clarity... :)
>>>Anyway I changed it to a macro, so now I can quickly test which is faster, and
>>>it seems the Mask[] lookup is faster, for the moment.
>>>
>>>-S.
>>>
>>
>>If it is faster, I am surprised.  IE the shift should run like the blazes
>>since there are no memory references (an immediate value of 1, shifted N
>>bits avoids memory) while a memory load can be expensive depending on
>>whether the mask is in memory, in L1/L2/L3 cache...
>>
>>That was a common trick on the Cray where _all_ memory references (prior to
>>the T90) were real memory accesses (no cache).
>
>To be honest I think I would need a specially designed test to get an answer
>here, the overall speed difference is so small that it's just lost in noise.
>
>-S.

If you (or anyone else) __really__ :-) want to test this on x86 platform, maybe
I could help you, since I have written a couple of such simple test functions in
x86 assembly; one of them should be placed at the start of the code segment
which are you interested in and the second at the end. You will get an array
indexed by number of CPU clocks in the measured code segment and filled with
numbers showing you, how many times takes the code segment corresponding CPU
clock count. It uses rdtsc CPU instruction. I have also an output C++ function,
but just in Czech, so it is rather unintelligible, I am afraid.
Filip

>>
>>>>bruce
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.