Computer Chess Club Archives




Subject: Re: SSE2 bit[64] * byte[64] dot product

Author: Gerd Isenberg

Date: 08:36:33 07/22/04

Go up one level in this thread

On July 22, 2004 at 10:15:33, Fabien Letouzey wrote:

>On July 22, 2004 at 09:56:50, Anthony Cozzie wrote:
>>If _you_ were only running integer code and suddenly saw your opponent executing
>>vector instructions, wouldn't you get a little scared?  Psychology is half the
>>battle . . .
>Indeed, not to mention C vs. ASM.
>I would like to know how much slower the portable solution you proposed is as
>compared with the ASM code.

Since SSE2 is actually so slow and require two 64-bit alu operations, it might
be possible that portable SWAR code as mentioned by Anthony is even faster. SSE2
has some potential to run parallel with other independent gp-instructions. And
future cores may have a lot faster SSE(2) units, as mentioned by AMD.

>Sorry that I couldn't resist joking, but just try to read the code aloud.  When
>I used ASM, mnemonics were made of only 4 or 5 letters.

Yes, the parallel Unpack and Interleave mnemonics are rather different:

punpck {h|l} {bw|wd|dq|qdq}
parallel  low          quad to double quad
 unpack high        double to quad
                 word to double
              byte to word

Using SSE2-intrinsics with newest msc is more like inline assembly for gcc.


>>P.S. Congrats on what looks like a win for Fruit in RWBC Class C.
>With so many rounds Fruit is bound to play Jonny at some point despite the
>latter's bad overall score, so anything can still happen.  One could say that
>thanks to this, Jonny still has a (good) chance to qualify, as it deserves.

This page took 0.01 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.