Author: Robert Hyatt
Date: 17:34:20 06/04/02
Go up one level in this thread
On June 04, 2002 at 19:05:43, Vincent Diepeveen wrote: >On June 04, 2002 at 15:34:38, Sean Mintz wrote: > >this is all useless. first you must put it to the mmx registers >and at the same time your program can't use floating point (which >for chessprograms won't be a problem but for my other software this >is a major problem). but that extra instruction to get it to mmx >then a context switch and another extra instruction to get it back to >the normal registers. > >That's a waste of time. Note that you only have to do this _once_. Then you can leave it alone since nobody is using the FP hardware in that process. When you context- switch to another application, you have thousands of times as much overhead in doing _other_ things (flushing cache, etc) as you do in resetting the FP to normal mode. > >In short it is only helpful if you program in assembly and see the >registers as something extra to use. Or if the compiler suddenly becomes "aware"... > >As soon as you get to the point where you have mixed data issues, >such as something in eax which you want to use to combine with mm1 >then you have major problems as you gotta use extra instruction to >get eax into mm1. That is ok. Compare the time to do that to the hundreds of clock cycles needed to get it from memory. > >Now suppose you manage to get them to run independantly, the question >which is there then is, how do you parallel time this all? > >Because the mmx instructions can get executed at a different speed >than the normal register instructions. > >you don't want to already get a result from eax before the previous >instructions in eax have finished. > >you don't want to already zobrist hash this piece into mm1 before >you know sure that the current hashing has been written to eax:edx > >Getting this all to work into a C program is *not* trivial. > >In fact it'll slow down once program. We must wait till the hammer >to be able to do more useful things i fear. > >>I was talking w/ Aaron Gordon and he found some interesting stuff in the intel c >>compiler guide about ''intrinsics''. >>--------------- >>''The major benefit of using intrinsics is that you now have access to key >>features that are not available using conventional coding practices. Intrinsics >>enable you to code with the syntax of C function calls and variables instead of >>assembly language. Most MMX? technology, Streaming SIMD Extensions, and >>Streaming SIMD Extensions 2 intrinsics have a corresponding C intrinsic that >>implements that instruction directly. This frees you from managing registers and >>enables the compiler to optimize the instruction scheduling. >> >>The MMX technology and Streaming SIMD Extension instructions use the following >>new features: >> >>New Registers--Enable packed data of up to 128 bits in length for optimal SIMD >>processing. >> >>New Data Types--Enable packing of up to 16 elements of data in one register.'' >>--------------- >>Here are the data types: >>--------------- >>''__m64 Data Type >>The __m64 data type is used to represent the contents of an MMX register, which >>is the register that is used by the MMX technology intrinsics. The __m64 data >>type can hold eight 8-bit values, four 16-bit values, two 32-bit values, or one >>64-bit value. >> >>__m128 Data Types >>The __m128 data type is used to represent the contents of a Streaming SIMD >>Extension register used by the Streaming SIMD Extension intrinsics. The __m128 >>data type can hold four 32-bit floating values. >> >>The __m128d data type can hold two 64-bit floating-point values. >> >>The __m128i data type can hold sixteen 8-bit, eight 16-bit, four 32-bit, or two >>64-bit integer values. >> >>The compiler aligns __m128 local and global data to 16-byte boundaries on the >>stack. To align integer, float, or double arrays, you can use the declspec >>statement.'' >>--------------- >>Prototypes for these intrinsics and some related macros and constants are in the >>header file xmmintrin.h. >> >>I think it'd be interesting to see if any speedup can be achieved by using these >>data types. Can anyone run some tests to find out? It would seem to me that if >>we can hold 64 bit values (using __m64) then we should see a 2x speedup in some >>cases. Hope this helps some people.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.