Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: On SSE2-Intrinsics

Author: Gerd Isenberg

Date: 11:08:22 04/08/05

Go up one level in this thread


On April 06, 2005 at 19:00:37, Aart J.C. Bik wrote:

>On March 28, 2005 at 08:16:33, Gerd Isenberg wrote:
>
>>There is one (minor) problem in the code above, using _mm_cmpeq_epi32(f,f) with
>>not initialized, but same arguments as -1 setter forcing pcmpeqd xmmi,xmmi,
>>where the initial value of xmmi don't cares. The debug version fires a runtime
>>exception, but the release version was a few cycles faster, using not
>>initialized xmm-register variables. Unfortunately there is no -1 setter like
>>_mm_setzero_si128() for zero - at least i don't find one.
>>
>>Cheers,
>>Gerd
>
>Hi Gerd,
>Indeed, only a “0”-setter is supported, not a “-1”-setter. In a future release
>of the Intel compiler, we plan to optimize the instructions used to implement
>intrinsics like _mm_set1_epi32(c) for various values of the constant, so that
>the user does not have to worry about what sequence is best for a given
>platform. Your posting has given me a good reason to expedite this process!
>Thanks.
>
>Aart Bik
>http://www.aartbik.com/


Hi Aart,
yes, with such intrinsics compiler may decide whether to load a SIMD-constant
via memory or for some easy to compute constants by some prologue instructions
to eventually break some early register stalls ;-)
A bytewise one-vector might a bit too expensive by using two registers and three
instructions to bytewise subtract minus one from zero.

What i missed recently - an intrinsic based on movd to load a 32-integer from
memory into a the lowest dword of an xmm-register, zero extending to 64-bit. Let
say as an operand for a variable p-shift.

Another boring question about using _mm_load_si128 and _mm_store_si128 on
properly 16-byte aligned data in memory, already refered or casted by __m128i[].
Is it ok, to use assign-operator, or is it really necessary or safer to use
_mm_load_si128 which only seems to act like a cast? Is the assignment of __m128i
compiler implementation depending? Some backward compatibility issues?
Maybe i miss something.

void foo (__m128i[] target, const __m128i[] source) {
   __m128i a = source[0];
   ...
   target[0] = a;
   ...
}

versus

void foo (__m128i[] target, const __m128i[] source) {
   __m128i a = _mm_load_si128(&source[0]);
   ...
   _mm_store_si128(&target[0], a);
   ...
}

Guess generating C-code with intrinsics is your way to implement vectorisation
with intel C ;-)

Thanks for your patience,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.