Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Possible small improvement to hacky method

Author: Matt Taylor

Date: 02:39:16 12/08/02

Go up one level in this thread


On December 07, 2002 at 23:40:38, Walter Faxon wrote:

>On December 07, 2002 at 14:14:32, Matt Taylor wrote:
>
>>On December 06, 2002 at 22:50:40, Walter Faxon wrote:
>>
>>>On December 06, 2002 at 05:33:42, Matt Taylor wrote:
>>>
>>><snip>
>>>>
>>>>It's also a tad strange that the code loads dl and then copies edx into eax. >It would be more direct to simply store the table value in eax.
>>>>
>>><snip>
>>>
>>>I get the feeling that, for the compiler in question at least, once it decides
>>>that a register is going to be used as an address or offset, it loses or ignores
>>>its knowledge of the register's bitwise mapping.  It preps edx to receive the
>>>byte in dl, uses eax to load dl, then copies the whole thing to eax.  If it used
>>>eax to write to al directly, the compiler would think it still needs to mask out
>>>the (already zeroed) rest of eax afterwards.  So it does it this way because the
>>>reg-reg copy is faster and the edx prep can be overlapped with other work.
>>>Anyway, that's a possible explanation.  One would need detailed knowledge of the
>>>compiler to know for sure.  (And don't get mad at the compiler writers; writing
>>>good compilers is hard work!)
>>>
>>>-- Walter
>>
>>Yeah, but the x86 architecture has a movzx instruction for that very purpose.
>>AMD manuals actually advise that it is faster to use movzx than the equivalent
>>sequence...
>>
>>And yeah, I know compiler writing is very difficult. I have actually been
>>working on one for various reasons. The difference is that I have a human
>>optimizer. :-)
>>
>>Actually I was working on an optimizer that takes machine code and produces more
>>optimal machine code (which is what I will spit my compiler output through when
>>it's done).
>>
>>-Matt
>
>
>Man, it's clear I gotta get a PIV asm book (or the AMD equivalent).  All I got
>is an old 486 manual and that says movzx is slower than mov, 3 clocks to 1.
>
>A machine code optimizer is terrifically general.  Let us know when you're done;
>a lot of us will be interested.
>
>-- Walter

Yeah, movzx -used- to be slower. You won't find timing data past the Pentium.
The most comprehensive timing reference I've ever seen was documented by a
company called Quantasm. I can't find them on the web anymore and can only
presume they've gone out of business. However, I managed to find those old docs
and upload them to my website:
http://my.fit.edu/~mtaylor/opcode_i.html "Integer" ops (ALU/system)
http://my.fit.edu/~mtaylor/opcode_f.html FPU ops

I dug up links to the latest manuals in case you were interested. Personally, I
prefer the Intel manuals as I find them easier to navigate. Intel Volume 2 has
an exhaustive instruction listing. The AMD manuals contain certain subsets
(system, general, FPU/3DNow, MMX/SSE) in different manuals.

Optimization references are a bit harder to come by. The only stuff I have is
specific to the Pentium 2/3. I might also have some early Athlon optimization
docs lying around somewhere...

AMD x86-64 manuals:
http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_4699_875%5e7044,00.html?redir=CPX801

Intel P4 manuals:
http://developer.intel.com/design/pentium4/manuals/

FYI you can order a hard copy of the x86-64 manuals from AMD's website for free
last I checked. Most of the application stuff is identical to current x86
architecture. The only differences I can think of offhand besides the 64-bitness
are the REX encodings, RIP-relative addressing, and registers r8-r15 and
xmm8-xmm15.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.