Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Optimizing C code for speed

Author: Bo Persson

Date: 17:27:45 01/04/03

Go up one level in this thread


On January 04, 2003 at 12:59:12, Matt Taylor wrote:

>On January 04, 2003 at 09:26:47, Bo Persson wrote:
>
>>On January 03, 2003 at 19:14:35, Matt Taylor wrote:
>>
>>>
>>>That is definitely the best way to learn. Pick up an Intel manual (if you are
>>>interested in learning Intel assembly), write a little C code, and study the
>>>compiler output. Once you are familiar with the machine and instruction set, you
>>>can learn all the assembly optimization tricks out of AMD/Intel optimization
>>>manuals.
>>>
>>>I have a copy of the Intel 386 manual converted to HTML. The architecture hasn't
>>>changed significantly since the 386.
>>
>>No, because that defined the x86 architecture.  :-)
>>
>>Unfortunately the instruction timings have changed. Several times. In different
>>directions.
>>
>>Sigh!
>>
>>
>>Bo Persson
>>bop2@telia.com
>
>The timings are always changing, but there is little point anymore in paying
>attention to specific timings. Many other optimizations (branch elimination,
>unrolling, vectorization, etc.) pay off big, and if you write code that executes
>well on P5/P6 core (Pentium, PPro, Pentium 2, etc.), it usually runs well on
>modern K7/P7 as well.

Sometimes, sometimes not.

My complaints are about inconsistency. Intel first introduced BSR/BSF to get som
(reasonably) fast bit instructions. Then they made them soooo slow on the P5,
that they were actually slower than a table lookup. How do you do that?!

Then on the P6 they were extremely fast, to be slow again on the Pentium 4...

On the other hand, MOVZX and MOVSX, which we have been taught *not* to use,
ever, are now suddenly in the core set for the Pentium 4. Among the rare 13
instructions than can actually execute 2 per clock per ALU. **)

>Also, timings on superscalar processors are less meaningful because of
>out-of-order execution engines.
>
>I'm optimizing some routines (~20-30 instructions) now that execute around 16
>cycles, and I have only squeezed out 1 cycle over the course of 5 hours. It's
>just not worth it unless you have a lot of time or no better way to achieve
>performance.
>
>-Matt

**) Limited to a total of 3 by the trace cache...


Bo Persson
bop2@telia.com



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.