Author: Matt Taylor
Date: 09:52:58 01/14/03
Go up one level in this thread
On January 13, 2003 at 20:17:36, Russell Reagan wrote: >On January 13, 2003 at 18:30:05, Matt Taylor wrote: > >>I think the real bottleneck would be the misjudgement of the speed of MMX. It is >>not as fast to respond as the integer units, though it maintains similar >>throughput. Using MMX for 64-bit arithmetic is not worthwhile as the same >>operations are available from the integer unit with lower setup costs. The only >>advantages include a minor gain in parallelism in hand-tweaked code and >>additional register space. > >Apparently if you use MMX correctly, it will be significantly faster than the >corresponding routine written in C (if it relies on 64-bit operations). The >primary example that comes to mind is that Gerd uses MMX in IsiChess to do >64-bit operations in the KoggeStone algorithms. He said it gave him a small >speed increase. Compare that with the same routines written in C, and the C >routines will be significantly slower. I know this because I wrote a program >using those routines in C and it got about 70 knps (compare with Crafty >300-500knps), and all it did was alpha-beta, material + mobility eval, and >nothing else. I tried several bitboard implementations, and the common factor in >the slow ones was the C KoggeStone attack generation. Gerd didn't experience >such a significant speed hit when he used his MMX routines. So it does appear >that there is a misjudgement of the speed of using MMX, but I'm not sure whether >it is an underestimation or overestimation. MMX is probably faster than straight C in some cases, but if you write the 64-bit stuff in assembly using the main integer instructions, it will almost always be faster. The latency of an ALU instruction (bitwise/arithmatic/conditional) is 1, and it has been ever since the 486. The latency for similar arithmatic MMX instructions on my Athlon is 2 clocks, and on a Pentium 4 it is 2 or worse. On the same processors, you can do 64-bit operations usually in 1 clock. The only advantage to MMX is the extra registers you now have access to, but in my experiences code rarely saturates more than one of the 3 instruction sets (integer, FP, vector). Furthermore, movement of data between MMX registers and integers is horrifically slow, and if you mix with floating-point, you have to execute another slow instruction -- emms. I think greater performance can be achieved in hand-tweaked, purely-integer assembly. Unfortunately I do not have time right now to prove that theory, but if I ever get a chance, I will be sure to post some code. -Matt
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.