Author: Gerd Isenberg
Date: 02:39:37 01/14/03
Go up one level in this thread
On January 13, 2003 at 20:17:36, Russell Reagan wrote: >On January 13, 2003 at 18:30:05, Matt Taylor wrote: > >>I think the real bottleneck would be the misjudgement of the speed of MMX. It is >>not as fast to respond as the integer units, though it maintains similar >>throughput. Using MMX for 64-bit arithmetic is not worthwhile as the same >>operations are available from the integer unit with lower setup costs. The only >>advantages include a minor gain in parallelism in hand-tweaked code and >>additional register space. > >Apparently if you use MMX correctly, it will be significantly faster than the >corresponding routine written in C (if it relies on 64-bit operations). The >primary example that comes to mind is that Gerd uses MMX in IsiChess to do >64-bit operations in the KoggeStone algorithms. He said it gave him a small >speed increase. Yes and no. If you only replace your getSlideAttacks(int sq) from rotated lookup to MMX-Fill routines, there is a slight slowdown. The increase came from a special MMX-routine to detect pinned pieces (and remove checker) and an routine to get all attacks by one side. >Compare that with the same routines written in C, and the C >routines will be significantly slower. I know this because I wrote a program >using those routines in C and it got about 70 knps (compare with Crafty >300-500knps), and all it did was alpha-beta, material + mobility eval, and >nothing else. I tried several bitboard implementations, and the common factor in >the slow ones was the C KoggeStone attack generation. Gerd didn't experience >such a significant speed hit when he used his MMX routines. So it does appear >that there is a misjudgement of the speed of using MMX, but I'm not sure whether >it is an underestimation or overestimation. If you look to execution latency, MMX-instructions are slow (2 versus 1 cycle for and/or/xor). But Athlon seems to process up to four! independend MMX-instructions simultaniously. So that's it what makes KoggeStone or floodFill competitive to rotated: 1. Gaining from the property to get attack sets of multiple pieces of one kind (requires some redesign of the bitboard infrastructure). 2. Parallel processing of two directions with KoggeStone or up to four directions with dumb7fill (4 or 8 with hammer). 3. Doing most with 64-bit (hammer gp or MMX) or 128-bit registers (SSE2). 4. Using KoggeStone, one may preprocess and (temporary) store the 8(7)*3 direction propagators, which are only dependend on the empty square set, and using them later for white/black rooks, bishops and queen generators. With MMX, using the x ^ x - 2 trick to get all "right"-attacks is an additional improvement, due to SIMD-Byte instructions: movq mm3, mm5 ; occupied psubb mm5, mm1 ; occupied - rooks psubb mm5, mm1 ; occupied - 2*rooks pxor mm5, mm3 ; right := occupied ^ (occupied - 2*rooks) Regards, Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.