Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Help! Visual C++ intrinsics! (Nalimov, are you around here somewhere

Author: Anthony Cozzie

Date: 16:40:59 10/01/03

Go up one level in this thread


On October 01, 2003 at 16:00:12, Gerd Isenberg wrote:

>On October 01, 2003 at 14:36:25, Anthony Cozzie wrote:
>
>>I personally (as the author of a bitboard engine) find MMX/SSE to offer a lot of
>>possibilities.  Much of the time they are of little use, as the data is used
>>immediately after computation, but if it is not, we can save quite a bit: 8
>>extra registers, and more throughput as well.  Plus, I like the feeling of using
>>all the resources of the machine ;)
>
>Hi Anthony,
>
>absolutely same for me.
>
>>
>>I have been experimenting with the use of intrinsics instead of inline assembly.
>
>I never tried mmx-intrinsics so far - due to the "stange" intrinsic types.
>Therefore i'm very interested in your results.
>
>
>> I feel that intrinsics can offer several advantages over inline assembly:
>>  1. Compiler can interleave other instructions with MMX code (big)
>>  2. More portable: can typedef things, and in general only have to write it
>>once (as opposed to 1 time for GCC/GAS/ATT, and 1 time for VC++/Intel)
>>  3. On platforms without MMX, the code can easily be converted to the standard
>>64 bit "fake integer" operations.
>>
>>In other words, a given piece of code can be written once instead of three
>>times, and be faster to boot. Unfortunately, this is the theory.
>>
>
>Intrinsics, my "hope" to use sse2 for amd64 ;-)
>
>I'm thinking about a kogge stone c-source generator, including a
>sse2-intrinsics, where i can produce (combined) attack routines with some source
>and target structures. The generator may be adjustable to produce pure C++ or
>pure sse2-intrinsics and properly scheduled intermediates, specially using sse2
>for rigth rank attacks via eight bytewise x ^ x - 2.
>

I have never considered something like this, but it would actually be really
easy.
You give it intel syntax assembly, it spits out intel + att  + c.  It would be a
weekend project.
Still, I wish Nalimov would stop by.

>>Problem 1: MSVC++ seems to do a horrible job at generating assembly
>>
>>First, I wrote some code that used 10 intrinsic variables in one function.  I
>>had already figured out how to do this A) without any register spilling (1
>>load/variable) and B) using a minimum of movq mm0, mm1 type stuff. (4
>>instructions of this type). When I gave the intrinsic version to MSVC++, it
>>generated about 20 of these wasteful instructions.  It also generated the
>>following little gem:
>>
>>004021FF  por         mm5,mm0
>>00402202  movq        mm0,mm5
>>
>>Which could easily be replaced by the _single instruction_ por mm0, mm5
>
>Hmm...

>>
>>Problem 2: MSVC++ insists on moving the data into the intrinsic variables
>>
>>example:
>>		bitboard_to_intrinsic(ibbxai, ibbxa);
>>0040215E  mov         eax,dword ptr [esp+4]
>>00402162  mov         dword ptr [esp+34h],ecx
>>...
>>
>>Basically, any time I want to use their intrinsics, it means I have to first
>>copy the data from the stack to another place on the stack, and then load it
>>into the MMX registers, which obviously defeats the whole purpose of the
>>optimization.  I have a suspicion that this is because the __m64 datatype is
>>defined with __declspec(align(8)).  Has anyone tried intrinsics and run into
>>these before?  Does GCC/Intel C do a better job?  It seems to me like this is
>>_really_ bad code generation: anyone who is using intrinsics is going to be
>>doing it for performance, and this clearly is slow. I am using MSVC 7.1.
>>
>>Anthony
>
>That sounds all very strange.
>
>Have you tried aligned (structs of) unions of __int64 and __m64,
>probably globals or static class members - and to access via pointer, to load
>and store some mmx-registers?

I can, but I'd have to replace thousands of instances in my program.  I could
try a test run sometime.

>Some float (x87) interactions?
>
>Hopefully these redundant copies and lousy codegeneration is a result of an
>skipped or not yet available optimization run ;-)
>
>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.