Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Intel compiler, SIMD, and Bitboards

Author: Robert Hyatt

Date: 17:34:20 06/04/02

Go up one level in this thread


On June 04, 2002 at 19:05:43, Vincent Diepeveen wrote:

>On June 04, 2002 at 15:34:38, Sean Mintz wrote:
>
>this is all useless. first you must put it to the mmx registers
>and at the same time your program can't use floating point (which
>for chessprograms won't be a problem but for my other software this
>is a major problem). but that extra instruction to get it to mmx
>then a context switch and another extra instruction to get it back to
>the normal registers.
>
>That's a waste of time.

Note that you only have to do this _once_.  Then you can leave it alone
since nobody is using the FP hardware in that process.  When you context-
switch to another application, you have thousands of times as much overhead
in doing _other_ things (flushing cache, etc) as you do in resetting the FP
to normal mode.





>
>In short it is only helpful if you program in assembly and see the
>registers as something extra to use.

Or if the compiler suddenly becomes "aware"...


>
>As soon as you get to the point where you have mixed data issues,
>such as something in eax which you want to use to combine with mm1
>then you have major problems as you gotta use extra instruction to
>get eax into mm1.

That is ok.  Compare the time to do that to the hundreds of clock cycles
needed to get it from memory.



>
>Now suppose you manage to get them to run independantly, the question
>which is there then is, how do you parallel time this all?
>
>Because the mmx instructions can get executed at a different speed
>than the normal register instructions.
>
>you don't want to already get a result from eax before the previous
>instructions in eax have finished.
>
>you don't want to already zobrist hash this piece into mm1 before
>you know sure that the current hashing has been written to eax:edx
>
>Getting this all to work into a C program is *not* trivial.
>
>In fact it'll slow down once program. We must wait till the hammer
>to be able to do more useful things i fear.
>
>>I was talking w/ Aaron Gordon and he found some interesting stuff in the intel c
>>compiler guide about ''intrinsics''.
>>---------------
>>''The major benefit of using intrinsics is that you now have access to key
>>features that are not available using conventional coding practices. Intrinsics
>>enable you to code with the syntax of C function calls and variables instead of
>>assembly language. Most MMX? technology, Streaming SIMD Extensions, and
>>Streaming SIMD Extensions 2 intrinsics have a corresponding C intrinsic that
>>implements that instruction directly. This frees you from managing registers and
>>enables the compiler to optimize the instruction scheduling.
>>
>>The MMX technology and Streaming SIMD Extension instructions use the following
>>new features:
>>
>>New Registers--Enable packed data of up to 128 bits in length for optimal SIMD
>>processing.
>>
>>New Data Types--Enable packing of up to 16 elements of data in one register.''
>>---------------
>>Here are the data types:
>>---------------
>>''__m64 Data Type
>>The __m64 data type is used to represent the contents of an MMX register, which
>>is the register that is used by the MMX technology intrinsics. The __m64 data
>>type can hold eight 8-bit values, four 16-bit values, two 32-bit values, or one
>>64-bit value.
>>
>>__m128 Data Types
>>The __m128 data type is used to represent the contents of a Streaming SIMD
>>Extension register used by the Streaming SIMD Extension intrinsics. The __m128
>>data type can hold four 32-bit floating values.
>>
>>The __m128d data type can hold two 64-bit floating-point values.
>>
>>The __m128i data type can hold sixteen 8-bit, eight 16-bit, four 32-bit, or two
>>64-bit integer values.
>>
>>The compiler aligns __m128 local and global data to 16-byte boundaries on the
>>stack. To align integer, float, or double arrays, you can use the declspec
>>statement.''
>>---------------
>>Prototypes for these intrinsics and some related macros and constants are in the
>>header file xmmintrin.h.
>>
>>I think it'd be interesting to see if any speedup can be achieved by using these
>>data types. Can anyone run some tests to find out? It would seem to me that if
>>we can hold 64 bit values (using __m64) then we should see a 2x speedup in some
>>cases. Hope this helps some people.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.