Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Reverse Bitboards

Author: Matt Taylor

Date: 09:52:58 01/14/03

Go up one level in this thread


On January 13, 2003 at 20:17:36, Russell Reagan wrote:

>On January 13, 2003 at 18:30:05, Matt Taylor wrote:
>
>>I think the real bottleneck would be the misjudgement of the speed of MMX. It is
>>not as fast to respond as the integer units, though it maintains similar
>>throughput. Using MMX for 64-bit arithmetic is not worthwhile as the same
>>operations are available from the integer unit with lower setup costs. The only
>>advantages include a minor gain in parallelism in hand-tweaked code and
>>additional register space.
>
>Apparently if you use MMX correctly, it will be significantly faster than the
>corresponding routine written in C (if it relies on 64-bit operations). The
>primary example that comes to mind is that Gerd uses MMX in IsiChess to do
>64-bit operations in the KoggeStone algorithms. He said it gave him a small
>speed increase. Compare that with the same routines written in C, and the C
>routines will be significantly slower. I know this because I wrote a program
>using those routines in C and it got about 70 knps (compare with Crafty
>300-500knps), and all it did was alpha-beta, material + mobility eval, and
>nothing else. I tried several bitboard implementations, and the common factor in
>the slow ones was the C KoggeStone attack generation. Gerd didn't experience
>such a significant speed hit when he used his MMX routines. So it does appear
>that there is a misjudgement of the speed of using MMX, but I'm not sure whether
>it is an underestimation or overestimation.

MMX is probably faster than straight C in some cases, but if you write the
64-bit stuff in assembly using the main integer instructions, it will almost
always be faster. The latency of an ALU instruction
(bitwise/arithmatic/conditional) is 1, and it has been ever since the 486. The
latency for similar arithmatic MMX instructions on my Athlon is 2 clocks, and on
a Pentium 4 it is 2 or worse. On the same processors, you can do 64-bit
operations usually in 1 clock.

The only advantage to MMX is the extra registers you now have access to, but in
my experiences code rarely saturates more than one of the 3 instruction sets
(integer, FP, vector). Furthermore, movement of data between MMX registers and
integers is horrifically slow, and if you mix with floating-point, you have to
execute another slow instruction -- emms.

I think greater performance can be achieved in hand-tweaked, purely-integer
assembly. Unfortunately I do not have time right now to prove that theory, but
if I ever get a chance, I will be sure to post some code.

-Matt



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.