Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bitboards

Author: Dan Newman

Date: 01:25:31 12/10/98

Go up one level in this thread


On December 09, 1998 at 20:42:58, James Long wrote:

>On December 09, 1998 at 05:32:15, Dan Newman wrote:
>
>>On December 06, 1998 at 10:23:54, Robert Hyatt wrote:
>>
>>>I haven't finished this yet...  I've sent it to a couple of people in draft
>>>form when they asked...  And will do the same for you if you are interested.
>>>Note that it is a rough draft, reasonably complete, but not "polished" yet...
>>
>>I'd be interested in a copy too (hope you're not too inundated with
>>requests).   email:  dnewman@bellatlantic.net
>>
>>After reading the lastest posts on bitboards, I've begun trying my
>>hand at them (again).  The main sticking point with me was the cost
>>of bit index extraction--I was only able to get bitboard move
>>generation to go at about half the speed of 0x88.  But, I just did
>>an experiment with the BSF instruction (inline assembly) and have
>>gotten the best results yet--something less than 9 clocks per
>>bit-index on a P6.  My previous best was 16 clocks using pure C.
>>
>>One trick I just discovered (which is pretty obvious actually) is
>>using XOR to clear the bit:  I had been doing this:
>>
>>      bitboard &= ~(1 << index);
>>
>>but this can be done, saving one operation, like this:
>>
>>      bitboard ^= (1 << index);
>
>Better yet precompute the mask for each square's bit,
>and just do something like:
>
>      bitboard ^= bb_mask[index];

I just tried this out and found the on-the-fly mask was just a
tiny bit faster (9.2 clocks/index vs 9.6 clocks/index using
precomputed masks).  Aparently going to memory once costs more
than a couple of register ops--even when that memory data
stays in cache (the benchmark code was very small and so
the masks probably stuck around in the cache fairly well).

If this code were in a larger program, the precomputed masks might
well find themselves blown out of cache fairly frequently--so
it's possible that masks computed on-the-fly are quite a bit
better in a real program.  All this of course depends on a lot
of things like which processor, compiler, programmer, etc. is
involved...

[I should add that the 9.x clocks includes the loop overhead,
so that the bit index extraction is actually less than this.]

-Dan.

>
>--
>James
>
>
>>
>>since we know the bit is set.
>>
>>-Dan.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.