Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: On SSE2-Intrinsics continuation

Author: Gerd Isenberg

Date: 02:13:43 04/01/05

It seems that the m128i = _mm_load_si128(__m128i*) intrinsic to load a xmm
register variable from memory is not necessary - at least with this msvc 2005
beta compiler i use?! One can simply assign content of a m128i-pointer to that
variable in usual c-manner, producing the same 16-aligned movdqa instructions.
Hmm... the code immediatly becomes more readable, like this max gem:

int getMaxOf64(short s64[] /* aligned 16 */)
{
  __m128i* ps = (__m128i*) s64;
  __m128i x0 = ps[0]; // first 8 shorts
  __m128i x1 = ps[1];
  x0 = _mm_max_epi16 (x0, ps[2]);
  x1 = _mm_max_epi16 (x1, ps[3]);
  x0 = _mm_max_epi16 (x0, ps[4]);
  x1 = _mm_max_epi16 (x1, ps[5]);
  x0 = _mm_max_epi16 (x0, ps[6]);
  x1 = _mm_max_epi16 (x1, ps[7]);
  x1 = _mm_max_epi16 (x1, x0);
  x0 = _mm_max_epi16 (x1, _mm_srli_si128 (x1, 2));
  x1 = _mm_max_epi16 (x0, _mm_srli_si128 (x0, 4));
  x0 = _mm_max_epi16 (x1, _mm_srli_si128 (x1, 8));
  // short cast is necessary for sign extension
  return (short)_mm_extract_epi16 (x0, 0);
}

Some additional information on porting SIMD to w86 in this amd-pdf:
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/dwamd_MMCodec_amd64.pdf

Gerd

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.