Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Efficient Rotated Bitboard Representations

Author: Roberto Waldteufel

Date: 16:45:28 10/31/98

Go up one level in this thread



On October 31, 1998 at 10:44:25, Robert Hyatt wrote:

>On October 31, 1998 at 07:27:43, Bo Persson wrote:
>
>>On October 29, 1998 at 13:20:45, Roberto Waldteufel wrote:
>>[snip]
>>>Of course, once I started using assembler on the intel CPU, the bsf and bsr
>>>instructions were a god-send and I stopped all this nonsense with modular
>>>arithmetic....:-)
>>>
>>>Best wishes,
>>>Roberto
>>
>>I thought these instructions were microcoded and extremely slow on anything
>>newer than a 386! When testing on a Pentium, a table lookup was actually much
>>faster for me.
>>
>>
>>Bo Persson
>>bop@malmo.mail.telia.com
>
>
>ditto here.  But I only tested this on a P5 (non-mmx) when I started doing this
>stuff.  the instructions were grossly slow.  Someone even tried some hand-coded
>stuff to help and it was still slower than what I do at present (table lookup).

I don't know if these instructions are microcoded  or not, but if you guys both
say so then I guess they probably are. However, I just ran some tests with a
rather surprising outcome. I have recently developed a rotated bitboard
implementation for diagonal moves with a view to upgrading my program from
ordinary bitboards. Having got it working, I was interested to see how fast it
was, so I got it to generate attacks for 10 seconds (sequentially doing all 64
squares in turn repeatedly) and calculated the number of maps generated per
second, averaged out over several runs. I used a PII 333MHz (with MMX), and I
got almost exactly 400,000 maps per second. Then I tested my old assembler coded
method (using lots of bsf and bsr instructions!) for generating the same thing.
Since it was easy to do at the same time, I tested also rook and queen map
generations using this assembler method. In each case I had to insert two extra
instructions (pushad and popad) on account of the loop within which the code
ran, so the real values would be very slightly faster than what I measured:

Bishop attack maps:           580,000 maps per second
Rook attack maps:             560,000 maps per second
Queen attack maps:           510,000 maps per second

So the assembler version with the bsf and bsr instructions outperformed the
rotated bitboard generator by over 25%. I have not tested rooks and queens with
rotated bitboards, but I suspect the rooks would be similar to the bishops, and
the queens would take double the time using rotated bitboards, whereas my method
shows only a slight degradation in speed going from bishops or rooks to the
omnidirectional queen. I conclude that either

a) the assembler version is very efficient, or

b) I screwed up somehow in my rotated bitboard implementation.

I am quite ready to believe that (b) was the culprit, but even so, I think the
assembler routine is still pretty fast. Has anyone else tested this sort of
thing recently? I don't really know how much of a difference the PII technology
makes, but unless I can speed up the rotated bitboard method I will stick to the
assembler routine. Of course, whichever method is faster, I could still ditch
the incrementally updated attack maps and generate on the fly like Crafty does.

If bsf and bsr really are microcoded, would I not get much lower counts? Or
maybe I ought to be getting even higher than I am? I don't know exactly how fast
is fast in this context.

Best wishes,
Roberto



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.