Author: Gerd Isenberg
Date: 11:08:22 04/08/05
Go up one level in this thread
On April 06, 2005 at 19:00:37, Aart J.C. Bik wrote: >On March 28, 2005 at 08:16:33, Gerd Isenberg wrote: > >>There is one (minor) problem in the code above, using _mm_cmpeq_epi32(f,f) with >>not initialized, but same arguments as -1 setter forcing pcmpeqd xmmi,xmmi, >>where the initial value of xmmi don't cares. The debug version fires a runtime >>exception, but the release version was a few cycles faster, using not >>initialized xmm-register variables. Unfortunately there is no -1 setter like >>_mm_setzero_si128() for zero - at least i don't find one. >> >>Cheers, >>Gerd > >Hi Gerd, >Indeed, only a “0”-setter is supported, not a “-1”-setter. In a future release >of the Intel compiler, we plan to optimize the instructions used to implement >intrinsics like _mm_set1_epi32(c) for various values of the constant, so that >the user does not have to worry about what sequence is best for a given >platform. Your posting has given me a good reason to expedite this process! >Thanks. > >Aart Bik >http://www.aartbik.com/ Hi Aart, yes, with such intrinsics compiler may decide whether to load a SIMD-constant via memory or for some easy to compute constants by some prologue instructions to eventually break some early register stalls ;-) A bytewise one-vector might a bit too expensive by using two registers and three instructions to bytewise subtract minus one from zero. What i missed recently - an intrinsic based on movd to load a 32-integer from memory into a the lowest dword of an xmm-register, zero extending to 64-bit. Let say as an operand for a variable p-shift. Another boring question about using _mm_load_si128 and _mm_store_si128 on properly 16-byte aligned data in memory, already refered or casted by __m128i[]. Is it ok, to use assign-operator, or is it really necessary or safer to use _mm_load_si128 which only seems to act like a cast? Is the assignment of __m128i compiler implementation depending? Some backward compatibility issues? Maybe i miss something. void foo (__m128i[] target, const __m128i[] source) { __m128i a = source[0]; ... target[0] = a; ... } versus void foo (__m128i[] target, const __m128i[] source) { __m128i a = _mm_load_si128(&source[0]); ... _mm_store_si128(&target[0], a); ... } Guess generating C-code with intrinsics is your way to implement vectorisation with intel C ;-) Thanks for your patience, Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.