Author: Gerd Isenberg
Date: 02:13:43 04/01/05
Go up one level in this thread
It seems that the m128i = _mm_load_si128(__m128i*) intrinsic to load a xmm
register variable from memory is not necessary - at least with this msvc 2005
beta compiler i use?! One can simply assign content of a m128i-pointer to that
variable in usual c-manner, producing the same 16-aligned movdqa instructions.
Hmm... the code immediatly becomes more readable, like this max gem:
int getMaxOf64(short s64[] /* aligned 16 */)
{
__m128i* ps = (__m128i*) s64;
__m128i x0 = ps[0]; // first 8 shorts
__m128i x1 = ps[1];
x0 = _mm_max_epi16 (x0, ps[2]);
x1 = _mm_max_epi16 (x1, ps[3]);
x0 = _mm_max_epi16 (x0, ps[4]);
x1 = _mm_max_epi16 (x1, ps[5]);
x0 = _mm_max_epi16 (x0, ps[6]);
x1 = _mm_max_epi16 (x1, ps[7]);
x1 = _mm_max_epi16 (x1, x0);
x0 = _mm_max_epi16 (x1, _mm_srli_si128 (x1, 2));
x1 = _mm_max_epi16 (x0, _mm_srli_si128 (x0, 4));
x0 = _mm_max_epi16 (x1, _mm_srli_si128 (x1, 8));
// short cast is necessary for sign extension
return (short)_mm_extract_epi16 (x0, 0);
}
Some additional information on porting SIMD to w86 in this amd-pdf:
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/dwamd_MMCodec_amd64.pdf
Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.