Author: Eugene Nalimov
Date: 10:38:06 01/14/02
Go up one level in this thread
On January 14, 2002 at 04:16:54, Ed Schröder wrote:
>On January 13, 2002 at 23:36:19, Eugene Nalimov wrote:
>
>>Can you please send me the function that was so badly compiled (probably via
>>e-mail)? I'd like to find where VC screwed up. It's too late to fix it for VC7,
>>but probably we can do it for VC7.x.
>>
>>Eugene
>
>
>Screwed up is a big word, ASM being being just 30% faster than C is a very good
>performance I would say. By head I remember the following cases:
>
>#1. a=b; c=d;
>
>The compiler will output something like:
>
>mov EAX,b
>mov a,EAX
>mov EAX,d
>mov c,EAX
>
>Wheras it should generate:
>
>mov EAX,b
>mov EBX,d
>mov a,EAX
>mov c,EBX
---- File c1.c:
int a, b, c, d;
void foo (void)
{
a = b; c = d;
}
---- File c1.asm (compiled with "cl /Ox /Fa c1.c")
[Some assembly stuff deleted]
_foo PROC NEAR
; File c:\repro\c1.c
; Line 5
mov eax, DWORD PTR _b
mov ecx, DWORD PTR _d
mov DWORD PTR _a, eax
mov DWORD PTR _c, ecx
; Line 6
ret 0
_foo ENDP
>#2. Always these unavoidable MOVSX and MOVZX instructions. No compiler can
>optimize this because it is impossible, only the ASM programmer knows what it is
>allowed under the circumstances.
Sometimes you can use C casts to avoid those... But yes, here assembly
programmer is definitely better.
>#3. Register use, same story as (2). I for instance use EBP and even ESP when I
>am short on registers.
VC, of course, use EBP when it decides it's beneficial.
>#4. "char" use in MSVC, for instance: char a1,a2,a3,a4,a5,a6,a7,a8;
>
>Will NOT produce the 8 characters as a sequential memory block. So in case I
>want to zero the 8 bytes I will be forced to write 8 instructions. Some other
>compilers do generate a sequential memory block so you can redefine a1 and a5 as
>32-bit and with 2 instructions zero them. This is pretty crucial in a chess
>program, at least in mine, also because I have to "stack" many stuff when going
>one ply deeper in the tree or when climbing back.
Never, never, do that on PIII and especially on P4. For the detailed explanation
look, for example, at "Intel Pentium 4 and Intel Xeon Processor Optimization
Reference Manual", Section 1-22 "Store Forwarding".
Eugene
>#5. Special stuff, no compiler is able to recognize as only the ASM programmer
>knows. I recently posted an example how to use the "indirect jump" the processor
>is offering you when for instance generating moves.
>
>So it is not about bugs, it is more why no compiler will be ever able to beat an
>experienced ASM programmer. However I do think that there is space for
>improvement in the (1) and (4) case, maybe even on (3).
>
>Ed
>
>
>
>
>>On January 13, 2002 at 18:51:02, Ed Schröder wrote:
>>
>>>On January 13, 2002 at 16:29:21, Tom Kerrigan wrote:
>>>
>>>>On January 13, 2002 at 07:05:02, Ed Schröder wrote:
>>>>
>>>>>I have to disagree, I have a MSVC6 version of Rebel and it runs 30% slower than
>>>>>the ASM version.
>>>>
>>>>What do you attribute this difference to? Is it simply not possible to write C
>>>>that produces the same assembly as your hand-written code? Or do you take
>>>>certain liberties in the C code (perhaps in the same of readability?) that's
>>>>slowing things down?
>>>>
>>>>-Tom
>>>
>>>Just have a look at the ASM code MSVC6 produces, it often is bad stuff. By
>>>re-writing (optimizing) this "bad ASM stuff" I got my +30%.
>>>
>>>One ambiguous remark, don't believe everthing commercials are telling you :)
>>>
>>>Ed
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.