Author: Gerd Isenberg
Date: 11:08:22 04/08/05
Go up one level in this thread
On April 06, 2005 at 19:00:37, Aart J.C. Bik wrote:
>On March 28, 2005 at 08:16:33, Gerd Isenberg wrote:
>
>>There is one (minor) problem in the code above, using _mm_cmpeq_epi32(f,f) with
>>not initialized, but same arguments as -1 setter forcing pcmpeqd xmmi,xmmi,
>>where the initial value of xmmi don't cares. The debug version fires a runtime
>>exception, but the release version was a few cycles faster, using not
>>initialized xmm-register variables. Unfortunately there is no -1 setter like
>>_mm_setzero_si128() for zero - at least i don't find one.
>>
>>Cheers,
>>Gerd
>
>Hi Gerd,
>Indeed, only a “0”-setter is supported, not a “-1”-setter. In a future release
>of the Intel compiler, we plan to optimize the instructions used to implement
>intrinsics like _mm_set1_epi32(c) for various values of the constant, so that
>the user does not have to worry about what sequence is best for a given
>platform. Your posting has given me a good reason to expedite this process!
>Thanks.
>
>Aart Bik
>http://www.aartbik.com/
Hi Aart,
yes, with such intrinsics compiler may decide whether to load a SIMD-constant
via memory or for some easy to compute constants by some prologue instructions
to eventually break some early register stalls ;-)
A bytewise one-vector might a bit too expensive by using two registers and three
instructions to bytewise subtract minus one from zero.
What i missed recently - an intrinsic based on movd to load a 32-integer from
memory into a the lowest dword of an xmm-register, zero extending to 64-bit. Let
say as an operand for a variable p-shift.
Another boring question about using _mm_load_si128 and _mm_store_si128 on
properly 16-byte aligned data in memory, already refered or casted by __m128i[].
Is it ok, to use assign-operator, or is it really necessary or safer to use
_mm_load_si128 which only seems to act like a cast? Is the assignment of __m128i
compiler implementation depending? Some backward compatibility issues?
Maybe i miss something.
void foo (__m128i[] target, const __m128i[] source) {
__m128i a = source[0];
...
target[0] = a;
...
}
versus
void foo (__m128i[] target, const __m128i[] source) {
__m128i a = _mm_load_si128(&source[0]);
...
_mm_store_si128(&target[0], a);
...
}
Guess generating C-code with intrinsics is your way to implement vectorisation
with intel C ;-)
Thanks for your patience,
Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.