Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Efficient Rotated Bitboard Representations

Author: Robert Hyatt

Date: 17:57:09 10/31/98

Go up one level in this thread


On October 31, 1998 at 19:45:28, Roberto Waldteufel wrote:

>
>On October 31, 1998 at 10:44:25, Robert Hyatt wrote:
>
>>On October 31, 1998 at 07:27:43, Bo Persson wrote:
>>
>>>On October 29, 1998 at 13:20:45, Roberto Waldteufel wrote:
>>>[snip]
>>>>Of course, once I started using assembler on the intel CPU, the bsf and bsr
>>>>instructions were a god-send and I stopped all this nonsense with modular
>>>>arithmetic....:-)
>>>>
>>>>Best wishes,
>>>>Roberto
>>>
>>>I thought these instructions were microcoded and extremely slow on anything
>>>newer than a 386! When testing on a Pentium, a table lookup was actually much
>>>faster for me.
>>>
>>>
>>>Bo Persson
>>>bop@malmo.mail.telia.com
>>
>>
>>ditto here.  But I only tested this on a P5 (non-mmx) when I started doing this
>>stuff.  the instructions were grossly slow.  Someone even tried some hand-coded
>>stuff to help and it was still slower than what I do at present (table lookup).
>
>I don't know if these instructions are microcoded  or not, but if you guys both
>say so then I guess they probably are. However, I just ran some tests with a
>rather surprising outcome. I have recently developed a rotated bitboard
>implementation for diagonal moves with a view to upgrading my program from
>ordinary bitboards. Having got it working, I was interested to see how fast it
>was, so I got it to generate attacks for 10 seconds (sequentially doing all 64
>squares in turn repeatedly) and calculated the number of maps generated per
>second, averaged out over several runs. I used a PII 333MHz (with MMX), and I
>got almost exactly 400,000 maps per second. Then I tested my old assembler coded
>method (using lots of bsf and bsr instructions!) for generating the same thing.
>Since it was easy to do at the same time, I tested also rook and queen map
>generations using this assembler method. In each case I had to insert two extra
>instructions (pushad and popad) on account of the loop within which the code
>ran, so the real values would be very slightly faster than what I measured:
>
>Bishop attack maps:           580,000 maps per second
>Rook attack maps:             560,000 maps per second
>Queen attack maps:           510,000 maps per second
>
>So the assembler version with the bsf and bsr instructions outperformed the
>rotated bitboard generator by over 25%. I have not tested rooks and queens with
>rotated bitboards, but I suspect the rooks would be similar to the bishops, and
>the queens would take double the time using rotated bitboards, whereas my method
>shows only a slight degradation in speed going from bishops or rooks to the
>omnidirectional queen. I conclude that either
>
>a) the assembler version is very efficient, or
>
>b) I screwed up somehow in my rotated bitboard implementation.
>
>I am quite ready to believe that (b) was the culprit, but even so, I think the
>assembler routine is still pretty fast. Has anyone else tested this sort of
>thing recently? I don't really know how much of a difference the PII technology
>makes, but unless I can speed up the rotated bitboard method I will stick to the
>assembler routine. Of course, whichever method is faster, I could still ditch
>the incrementally updated attack maps and generate on the fly like Crafty does.
>
>If bsf and bsr really are microcoded, would I not get much lower counts? Or
>maybe I ought to be getting even higher than I am? I don't know exactly how fast
>is fast in this context.
>
>Best wishes,
>Roberto


note the architecture I said I tested on:  the original P5 with no mmx, no
fancy tricks or anything. The P6/PII has a totally different cpu core, and
the bsf/bsr might be much faster in them.  I have not tested this since the
P6/PII came out... but I will now...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.