Author: Robert Hyatt
Date: 17:57:09 10/31/98
Go up one level in this thread
On October 31, 1998 at 19:45:28, Roberto Waldteufel wrote: > >On October 31, 1998 at 10:44:25, Robert Hyatt wrote: > >>On October 31, 1998 at 07:27:43, Bo Persson wrote: >> >>>On October 29, 1998 at 13:20:45, Roberto Waldteufel wrote: >>>[snip] >>>>Of course, once I started using assembler on the intel CPU, the bsf and bsr >>>>instructions were a god-send and I stopped all this nonsense with modular >>>>arithmetic....:-) >>>> >>>>Best wishes, >>>>Roberto >>> >>>I thought these instructions were microcoded and extremely slow on anything >>>newer than a 386! When testing on a Pentium, a table lookup was actually much >>>faster for me. >>> >>> >>>Bo Persson >>>bop@malmo.mail.telia.com >> >> >>ditto here. But I only tested this on a P5 (non-mmx) when I started doing this >>stuff. the instructions were grossly slow. Someone even tried some hand-coded >>stuff to help and it was still slower than what I do at present (table lookup). > >I don't know if these instructions are microcoded or not, but if you guys both >say so then I guess they probably are. However, I just ran some tests with a >rather surprising outcome. I have recently developed a rotated bitboard >implementation for diagonal moves with a view to upgrading my program from >ordinary bitboards. Having got it working, I was interested to see how fast it >was, so I got it to generate attacks for 10 seconds (sequentially doing all 64 >squares in turn repeatedly) and calculated the number of maps generated per >second, averaged out over several runs. I used a PII 333MHz (with MMX), and I >got almost exactly 400,000 maps per second. Then I tested my old assembler coded >method (using lots of bsf and bsr instructions!) for generating the same thing. >Since it was easy to do at the same time, I tested also rook and queen map >generations using this assembler method. In each case I had to insert two extra >instructions (pushad and popad) on account of the loop within which the code >ran, so the real values would be very slightly faster than what I measured: > >Bishop attack maps: 580,000 maps per second >Rook attack maps: 560,000 maps per second >Queen attack maps: 510,000 maps per second > >So the assembler version with the bsf and bsr instructions outperformed the >rotated bitboard generator by over 25%. I have not tested rooks and queens with >rotated bitboards, but I suspect the rooks would be similar to the bishops, and >the queens would take double the time using rotated bitboards, whereas my method >shows only a slight degradation in speed going from bishops or rooks to the >omnidirectional queen. I conclude that either > >a) the assembler version is very efficient, or > >b) I screwed up somehow in my rotated bitboard implementation. > >I am quite ready to believe that (b) was the culprit, but even so, I think the >assembler routine is still pretty fast. Has anyone else tested this sort of >thing recently? I don't really know how much of a difference the PII technology >makes, but unless I can speed up the rotated bitboard method I will stick to the >assembler routine. Of course, whichever method is faster, I could still ditch >the incrementally updated attack maps and generate on the fly like Crafty does. > >If bsf and bsr really are microcoded, would I not get much lower counts? Or >maybe I ought to be getting even higher than I am? I don't know exactly how fast >is fast in this context. > >Best wishes, >Roberto note the architecture I said I tested on: the original P5 with no mmx, no fancy tricks or anything. The P6/PII has a totally different cpu core, and the bsf/bsr might be much faster in them. I have not tested this since the P6/PII came out... but I will now...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.