Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: On SSE2-Intrinsics

Author: Gerd Isenberg

Date: 01:23:02 04/09/05

>I guess you are simply looking for the “_mm_cvtsi32_si128” intrinsic. For
>example
>
>    __m128i a;
>    void foo(int *p) {
>      a = _mm_cvtsi32_si128(*p);
>    }
>
>translates into
>
>     movd      xmm0, DWORD PTR [eax] ; clears higher 96-bit
>     movdqa    XMMWORD PTR _a, xmm0
>
>where, for your request, you would feed xmm0 into the shift factor rather than
>storing it.


Ah a conversion intrinsic. Exactly what i was looking for - thanks.


>
>>Is it ok, to use assign-operator
>
>Looks okay to me.

Ok. May be the crux with __m128i might be, if you have usual C-syntax assignment
and array access operarors, why not bitwise operators "and", "or" and "xor".
So the load and store intrinsics make clear that there is an intrinsic data type
where the usual operators can not be applied.

>
>>Guess generating C-code with intrinsics is your way to implement vectorisation
>>with intel C ;-)
>
>I am not sure what you mean by this.

My naive guess was that intel C produces an intermediate stream in a first pass
(like a preprozessor does) with XMM-intrinsics to target vectorisation - so that
the native compiler or code generator has only to deal with XMM-intrinsics as a
central instance for xmm-register allocation and code scheduling issues.


>In general, I promote automatic
>vectorization with the Intel compiler over heavy use of intrinsics, but I
>realize that this is not a viable option for expert programmers as yourself yet
>:-)

Hehe - for some very special cases, often related to some pattern like
sign-extending bits to bytes or to rotate 8*8 byte arrays ;-)

for (row = 0; row < 8; ++row)
for (col = 0; col < 8; ++col)
  target[row][col] = source[col][row];


// ~26 AMD64 cycles (inlined), ~3.x times faster than byte the above c-code
void rotate8x8bytes ( BYTE target[], const BYTE source[] )
{
  __m128i x0, x1, x2, x3, y0, y1, y2, y3;
  __m128i* ps = (__m128i*) source;
  __m128i* pt = (__m128i*) target;

  x0 = ps[0];
  x1 = ps[1];
  x2 = ps[2];
  x3 = ps[3];

  y0 = _mm_unpackhi_epi64 (x0, x0);
  y1 = _mm_unpackhi_epi64 (x1, x1);
  y2 = _mm_unpackhi_epi64 (x2, x2);
  y3 = _mm_unpackhi_epi64 (x3, x3);

  x0 = _mm_unpacklo_epi8  (x0, y0);
  x1 = _mm_unpacklo_epi8  (x1, y1);
  x2 = _mm_unpacklo_epi8  (x2, y2);
  x3 = _mm_unpacklo_epi8  (x3, y3);

  y0 = _mm_unpacklo_epi16 (x0, x1);
  y1 = _mm_unpackhi_epi16 (x0, x1);
  y2 = _mm_unpacklo_epi16 (x2, x3);
  y3 = _mm_unpackhi_epi16 (x2, x3);

  pt[0] = _mm_unpacklo_epi32 (y0, y2);
  pt[1] = _mm_unpackhi_epi32 (y0, y2);
  pt[2] = _mm_unpacklo_epi32 (y1, y3);
  pt[3] = _mm_unpackhi_epi32 (y1, y3);
}


Cheers,
Gerd


>
>Aart Bik
>http://www.aartbik.com/

Re: On SSE2-Intrinsics Aart J.C. Bik 09:26:39 04/11/05

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.