Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Possible small improvement to hacky method

Author: Walter Faxon
Date: 22:59:51 12/08/02
On December 08, 2002 at 05:39:16, Matt Taylor wrote:

>On December 07, 2002 at 23:40:38, Walter Faxon wrote:
>
>>On December 07, 2002 at 14:14:32, Matt Taylor wrote:
>>
>>>On December 06, 2002 at 22:50:40, Walter Faxon wrote:
>>>
>>>>On December 06, 2002 at 05:33:42, Matt Taylor wrote:
>>>>
>>>><snip>
>>>>>
>>>>>It's also a tad strange that the code loads dl and then copies edx into eax. >It would be more direct to simply store the table value in eax.
>>>>>
>>>><snip>
>>>>
>>>>I get the feeling that, for the compiler in question at least, once it decides
>>>>that a register is going to be used as an address or offset, it loses or ignores
>>>>its knowledge of the register's bitwise mapping.  It preps edx to receive the
>>>>byte in dl, uses eax to load dl, then copies the whole thing to eax.  If it used
>>>>eax to write to al directly, the compiler would think it still needs to mask out
>>>>the (already zeroed) rest of eax afterwards.  So it does it this way because the
>>>>reg-reg copy is faster and the edx prep can be overlapped with other work.
>>>>Anyway, that's a possible explanation.  One would need detailed knowledge of the
>>>>compiler to know for sure.  (And don't get mad at the compiler writers; writing
>>>>good compilers is hard work!)
>>>>
>>>>-- Walter
>>>
>>>Yeah, but the x86 architecture has a movzx instruction for that very purpose.
>>>AMD manuals actually advise that it is faster to use movzx than the equivalent
>>>sequence...
>>>
>>>And yeah, I know compiler writing is very difficult. I have actually been
>>>working on one for various reasons. The difference is that I have a human
>>>optimizer. :-)
>>>
>>>Actually I was working on an optimizer that takes machine code and produces more
>>>optimal machine code (which is what I will spit my compiler output through when
>>>it's done).
>>>
>>>-Matt
>>
>>
>>Man, it's clear I gotta get a PIV asm book (or the AMD equivalent).  All I got
>>is an old 486 manual and that says movzx is slower than mov, 3 clocks to 1.
>>
>>A machine code optimizer is terrifically general.  Let us know when you're done;
>>a lot of us will be interested.
>>
>>-- Walter
>
>Yeah, movzx -used- to be slower. You won't find timing data past the Pentium.
>The most comprehensive timing reference I've ever seen was documented by a
>company called Quantasm. I can't find them on the web anymore and can only
>presume they've gone out of business. However, I managed to find those old docs
>and upload them to my website:
>http://my.fit.edu/~mtaylor/opcode_i.html "Integer" ops (ALU/system)
>http://my.fit.edu/~mtaylor/opcode_f.html FPU ops
>
>I dug up links to the latest manuals in case you were interested. Personally, I
>prefer the Intel manuals as I find them easier to navigate. Intel Volume 2 has
>an exhaustive instruction listing. The AMD manuals contain certain subsets
>(system, general, FPU/3DNow, MMX/SSE) in different manuals.
>
>Optimization references are a bit harder to come by. The only stuff I have is
>specific to the Pentium 2/3. I might also have some early Athlon optimization
>docs lying around somewhere...
>
>AMD x86-64 manuals:
>http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_4699_875%5e7044,00.html?redir=CPX801
>
>Intel P4 manuals:
>http://developer.intel.com/design/pentium4/manuals/
>
>FYI you can order a hard copy of the x86-64 manuals from AMD's website for free
>last I checked. Most of the application stuff is identical to current x86
>architecture. The only differences I can think of offhand besides the 64-bitness
>are the REX encodings, RIP-relative addressing, and registers r8-r15 and
>xmm8-xmm15.
>
>-Matt


Thanks much for the references, Matt!  Now got lots of stuff on order...

Gee, these computers are getting so much more complicated, and with substantial
extra vendor-specific features; I'd be almost happy to return to the days
(mid-70's) of the famous CDC 6600 supercomputer where all you had to do was
arrange your code to do some work during the read latency.  (Well, really not:
60-bit words but only an 18 bit address space, so 1.875 MB of user-addressable
storage, with a (then astounding) 3 MHz clock; but when Seymour Cray left
Control Data he is supposed to have said, "The 6600 is my last _slow_ machine.")
 Anyway, I only had access to a 6600 for work, not chess. :(

Whatever happened to the RISC architectures that were supposed to make our lives
so much easier?

None of the following is your problem of course, but here's my dilemma:

(1) I'm a lazy old man so regardless of circumstances, I don't really expect to
do much hard computer chess coding in the immediate future.

(2) And any code I do write I would like to have available for any advanced x86
architecture, or better.  And I much prefer writing in C.  ("C combines the
power and flexibility of assembler with the clarity and elegance of
assembler.")*

(3) However, my latest brainstorm, absurd or not,

        http://www.talkchess.com/forums/1/message.html?269426

would seem to absolutely require use of Intel's new hyperthreading facility.
While at the same time, Intel apparently refuses to be candid about which of
their current processors support HT, and whether HT will be offered in future
products.

Note:  if one or more people publicly experiment with my "Hash-it-all!" notion I
won't feel so conflicted; I can sit back and become an "idea man". ;)

(4) Finally, my friend who owns a small computer farm, to which I have good
access, uses AMD processors (K6?) exclusively.  Though I don't know if that
should affect any of my decisions in this matter.

I guess what I'm saying is, "I don't wanna!"  If I write what I consider to be
reasonable code (and I'm a reasonable guy), I want the compiler to do all the
magic specific to the system involved, and make it great.

Which is why something like the machine code (re)optimizer that you're working
on is so important.  "Let the compiler writers worry about it!"  Your work will
be in a commercial product, I assume.

All just wishful thinking?...

By the way, I never thanked you for running additional tests on optimized
versions of LSB_64().  So, thanks!

-- Walter

----------
* My variant of:

1) C is often described, with a mixture of fondness and disdain varying
according to the speaker, as "a language that combines all the elegance and
power of assembly language with all the readability and maintainability of
assembly language." (MIT Jargon Dictionary)

2) "C combines the power of assembler with the portability of assembler." --
Anonymous, alluding to Bill Thacker.
Re: Possible small improvement to hacky method Matt Taylor 00:07:49 12/09/02
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.