Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Possible small improvement to hacky method

Author: Matt Taylor
Date: 00:07:49 12/09/02
On December 09, 2002 at 01:59:51, Walter Faxon wrote:

>On December 08, 2002 at 05:39:16, Matt Taylor wrote:
>
>>On December 07, 2002 at 23:40:38, Walter Faxon wrote:
>>
>>>On December 07, 2002 at 14:14:32, Matt Taylor wrote:
>>>
>>>>On December 06, 2002 at 22:50:40, Walter Faxon wrote:
>>>>
>>>>>On December 06, 2002 at 05:33:42, Matt Taylor wrote:
>>>>>
>>>>><snip>
>>>>>>
>>>>>>It's also a tad strange that the code loads dl and then copies edx into eax. >It would be more direct to simply store the table value in eax.
>>>>>>
>>>>><snip>
>>>>>
>>>>>I get the feeling that, for the compiler in question at least, once it decides
>>>>>that a register is going to be used as an address or offset, it loses or ignores
>>>>>its knowledge of the register's bitwise mapping.  It preps edx to receive the
>>>>>byte in dl, uses eax to load dl, then copies the whole thing to eax.  If it used
>>>>>eax to write to al directly, the compiler would think it still needs to mask out
>>>>>the (already zeroed) rest of eax afterwards.  So it does it this way because the
>>>>>reg-reg copy is faster and the edx prep can be overlapped with other work.
>>>>>Anyway, that's a possible explanation.  One would need detailed knowledge of the
>>>>>compiler to know for sure.  (And don't get mad at the compiler writers; writing
>>>>>good compilers is hard work!)
>>>>>
>>>>>-- Walter
>>>>
>>>>Yeah, but the x86 architecture has a movzx instruction for that very purpose.
>>>>AMD manuals actually advise that it is faster to use movzx than the equivalent
>>>>sequence...
>>>>
>>>>And yeah, I know compiler writing is very difficult. I have actually been
>>>>working on one for various reasons. The difference is that I have a human
>>>>optimizer. :-)
>>>>
>>>>Actually I was working on an optimizer that takes machine code and produces more
>>>>optimal machine code (which is what I will spit my compiler output through when
>>>>it's done).
>>>>
>>>>-Matt
>>>
>>>
>>>Man, it's clear I gotta get a PIV asm book (or the AMD equivalent).  All I got
>>>is an old 486 manual and that says movzx is slower than mov, 3 clocks to 1.
>>>
>>>A machine code optimizer is terrifically general.  Let us know when you're done;
>>>a lot of us will be interested.
>>>
>>>-- Walter
>>
>>Yeah, movzx -used- to be slower. You won't find timing data past the Pentium.
>>The most comprehensive timing reference I've ever seen was documented by a
>>company called Quantasm. I can't find them on the web anymore and can only
>>presume they've gone out of business. However, I managed to find those old docs
>>and upload them to my website:
>>http://my.fit.edu/~mtaylor/opcode_i.html "Integer" ops (ALU/system)
>>http://my.fit.edu/~mtaylor/opcode_f.html FPU ops
>>
>>I dug up links to the latest manuals in case you were interested. Personally, I
>>prefer the Intel manuals as I find them easier to navigate. Intel Volume 2 has
>>an exhaustive instruction listing. The AMD manuals contain certain subsets
>>(system, general, FPU/3DNow, MMX/SSE) in different manuals.
>>
>>Optimization references are a bit harder to come by. The only stuff I have is
>>specific to the Pentium 2/3. I might also have some early Athlon optimization
>>docs lying around somewhere...
>>
>>AMD x86-64 manuals:
>>http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_4699_875%5e7044,00.html?redir=CPX801
>>
>>Intel P4 manuals:
>>http://developer.intel.com/design/pentium4/manuals/
>>
>>FYI you can order a hard copy of the x86-64 manuals from AMD's website for free
>>last I checked. Most of the application stuff is identical to current x86
>>architecture. The only differences I can think of offhand besides the 64-bitness
>>are the REX encodings, RIP-relative addressing, and registers r8-r15 and
>>xmm8-xmm15.
>>
>>-Matt
>
>
>Thanks much for the references, Matt!  Now got lots of stuff on order...

Quite welcome.
It happens to be my job to keep up with x86 internals. :-)

>Whatever happened to the RISC architectures that were supposed to make our lives
>so much easier?

I do remember the days when it seemed RISC would topple CISC. Of course, I was
young, and I really didn't know enough about computers to say yay or nay.
Chances are that I still don't. I do know that, after Intel gained competition
from Cyrix and AMD, CISC dramatically increased in performance. They say that
Athlon is quite comparible to Alpha. Then again, their designs are remarkably
similar.

>None of the following is your problem of course, but here's my dilemma:
>
>(1) I'm a lazy old man so regardless of circumstances, I don't really expect to
>do much hard computer chess coding in the immediate future.
>
>(2) And any code I do write I would like to have available for any advanced x86
>architecture, or better.  And I much prefer writing in C.  ("C combines the
>power and flexibility of assembler with the clarity and elegance of
>assembler.")*
>
>(3) However, my latest brainstorm, absurd or not,
>
>        http://www.talkchess.com/forums/1/message.html?269426
>
>would seem to absolutely require use of Intel's new hyperthreading facility.
>While at the same time, Intel apparently refuses to be candid about which of
>their current processors support HT, and whether HT will be offered in future
>products.
>
>Note:  if one or more people publicly experiment with my "Hash-it-all!" notion I
>won't feel so conflicted; I can sit back and become an "idea man". ;)
>
>(4) Finally, my friend who owns a small computer farm, to which I have good
>access, uses AMD processors (K6?) exclusively.  Though I don't know if that
>should affect any of my decisions in this matter.
>
>I guess what I'm saying is, "I don't wanna!"  If I write what I consider to be
>reasonable code (and I'm a reasonable guy), I want the compiler to do all the
>magic specific to the system involved, and make it great.
>
>Which is why something like the machine code (re)optimizer that you're working
>on is so important.  "Let the compiler writers worry about it!"  Your work will
>be in a commercial product, I assume.
>
>All just wishful thinking?...
>
>By the way, I never thanked you for running additional tests on optimized
>versions of LSB_64().  So, thanks!
>
>-- Walter
>
>----------
>* My variant of:
>
>1) C is often described, with a mixture of fondness and disdain varying
>according to the speaker, as "a language that combines all the elegance and
>power of assembly language with all the readability and maintainability of
>assembly language." (MIT Jargon Dictionary)
>
>2) "C combines the power of assembler with the portability of assembler." --
>Anonymous, alluding to Bill Thacker.

I completely agree that development should be abstracted from assembler.

The optimizer might be commericial. That's a very difficult question. It is
related to a project at work, but that project has far greater bounds than
optimization. Right now I'm not being paid to work on the optimizer, and if I
complete it on my own time, I won't have to worry about intellectual property. I
own the code it's based on, anyway.

The project that it is related to is targetted toward game companies, and I
suggested that game companies may also appreciate such a tool since it would
reduce development time and yield better performance. If I am paid to complete
the optimizer, things change. Still, since the target is other businesses, I
don't think they would mind giving it away free for non-commericial usage.

It's still in the early stages of development, though. Also, there is no
guarantee that I'll be able to optimize for vectorization, and that's what most
of the new instructions are about, anyway. MMX allows integer vectors, 3DNow
allows vectors of 2 floats, and SSE allows both integer and float types.

-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.