Author: Anthony Cozzie
Date: 16:40:59 10/01/03
Go up one level in this thread
On October 01, 2003 at 16:00:12, Gerd Isenberg wrote: >On October 01, 2003 at 14:36:25, Anthony Cozzie wrote: > >>I personally (as the author of a bitboard engine) find MMX/SSE to offer a lot of >>possibilities. Much of the time they are of little use, as the data is used >>immediately after computation, but if it is not, we can save quite a bit: 8 >>extra registers, and more throughput as well. Plus, I like the feeling of using >>all the resources of the machine ;) > >Hi Anthony, > >absolutely same for me. > >> >>I have been experimenting with the use of intrinsics instead of inline assembly. > >I never tried mmx-intrinsics so far - due to the "stange" intrinsic types. >Therefore i'm very interested in your results. > > >> I feel that intrinsics can offer several advantages over inline assembly: >> 1. Compiler can interleave other instructions with MMX code (big) >> 2. More portable: can typedef things, and in general only have to write it >>once (as opposed to 1 time for GCC/GAS/ATT, and 1 time for VC++/Intel) >> 3. On platforms without MMX, the code can easily be converted to the standard >>64 bit "fake integer" operations. >> >>In other words, a given piece of code can be written once instead of three >>times, and be faster to boot. Unfortunately, this is the theory. >> > >Intrinsics, my "hope" to use sse2 for amd64 ;-) > >I'm thinking about a kogge stone c-source generator, including a >sse2-intrinsics, where i can produce (combined) attack routines with some source >and target structures. The generator may be adjustable to produce pure C++ or >pure sse2-intrinsics and properly scheduled intermediates, specially using sse2 >for rigth rank attacks via eight bytewise x ^ x - 2. > I have never considered something like this, but it would actually be really easy. You give it intel syntax assembly, it spits out intel + att + c. It would be a weekend project. Still, I wish Nalimov would stop by. >>Problem 1: MSVC++ seems to do a horrible job at generating assembly >> >>First, I wrote some code that used 10 intrinsic variables in one function. I >>had already figured out how to do this A) without any register spilling (1 >>load/variable) and B) using a minimum of movq mm0, mm1 type stuff. (4 >>instructions of this type). When I gave the intrinsic version to MSVC++, it >>generated about 20 of these wasteful instructions. It also generated the >>following little gem: >> >>004021FF por mm5,mm0 >>00402202 movq mm0,mm5 >> >>Which could easily be replaced by the _single instruction_ por mm0, mm5 > >Hmm... >> >>Problem 2: MSVC++ insists on moving the data into the intrinsic variables >> >>example: >> bitboard_to_intrinsic(ibbxai, ibbxa); >>0040215E mov eax,dword ptr [esp+4] >>00402162 mov dword ptr [esp+34h],ecx >>... >> >>Basically, any time I want to use their intrinsics, it means I have to first >>copy the data from the stack to another place on the stack, and then load it >>into the MMX registers, which obviously defeats the whole purpose of the >>optimization. I have a suspicion that this is because the __m64 datatype is >>defined with __declspec(align(8)). Has anyone tried intrinsics and run into >>these before? Does GCC/Intel C do a better job? It seems to me like this is >>_really_ bad code generation: anyone who is using intrinsics is going to be >>doing it for performance, and this clearly is slow. I am using MSVC 7.1. >> >>Anthony > >That sounds all very strange. > >Have you tried aligned (structs of) unions of __int64 and __m64, >probably globals or static class members - and to access via pointer, to load >and store some mmx-registers? I can, but I'd have to replace thousands of instances in my program. I could try a test run sometime. >Some float (x87) interactions? > >Hopefully these redundant copies and lousy codegeneration is a result of an >skipped or not yet available optimization run ;-) > >Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.