Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Reverse Bitboards

Author: Gerd Isenberg

Date: 02:39:37 01/14/03

Go up one level in this thread


On January 13, 2003 at 20:17:36, Russell Reagan wrote:

>On January 13, 2003 at 18:30:05, Matt Taylor wrote:
>
>>I think the real bottleneck would be the misjudgement of the speed of MMX. It is
>>not as fast to respond as the integer units, though it maintains similar
>>throughput. Using MMX for 64-bit arithmetic is not worthwhile as the same
>>operations are available from the integer unit with lower setup costs. The only
>>advantages include a minor gain in parallelism in hand-tweaked code and
>>additional register space.
>
>Apparently if you use MMX correctly, it will be significantly faster than the
>corresponding routine written in C (if it relies on 64-bit operations). The
>primary example that comes to mind is that Gerd uses MMX in IsiChess to do
>64-bit operations in the KoggeStone algorithms. He said it gave him a small
>speed increase.


Yes and no. If you only replace your getSlideAttacks(int sq) from rotated lookup
to MMX-Fill routines, there is a slight slowdown. The increase came from a
special MMX-routine to detect pinned pieces (and remove checker) and an routine
to get all attacks by one side.


>Compare that with the same routines written in C, and the C
>routines will be significantly slower. I know this because I wrote a program
>using those routines in C and it got about 70 knps (compare with Crafty
>300-500knps), and all it did was alpha-beta, material + mobility eval, and
>nothing else. I tried several bitboard implementations, and the common factor in
>the slow ones was the C KoggeStone attack generation. Gerd didn't experience
>such a significant speed hit when he used his MMX routines. So it does appear
>that there is a misjudgement of the speed of using MMX, but I'm not sure whether
>it is an underestimation or overestimation.

If you look to execution latency, MMX-instructions are slow (2 versus 1 cycle
for and/or/xor). But Athlon seems to process up to four! independend
MMX-instructions simultaniously.

So that's it what makes KoggeStone or floodFill competitive to rotated:
1. Gaining from the property to get attack sets of multiple pieces of one kind
   (requires some redesign of the bitboard infrastructure).
2. Parallel processing of two directions with KoggeStone
   or up to four directions with dumb7fill (4 or 8 with hammer).
3. Doing most with 64-bit (hammer gp or MMX) or 128-bit registers (SSE2).
4. Using KoggeStone, one may preprocess and (temporary) store
   the 8(7)*3 direction propagators, which are only dependend on the empty
   square set, and using them later for white/black rooks, bishops
   and queen generators.

With MMX, using the x ^ x - 2 trick to get all "right"-attacks is an additional
improvement, due to SIMD-Byte instructions:

movq	mm3, mm5	; occupied
psubb	mm5, mm1	; occupied -   rooks
psubb	mm5, mm1	; occupied - 2*rooks
pxor	mm5, mm3	; right := occupied ^ (occupied - 2*rooks)

Regards,
Gerd




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.