Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 64-bit machines

Author: Matt Taylor

Date: 21:14:46 02/08/03

Go up one level in this thread


On February 07, 2003 at 08:09:23, Tom Kerrigan wrote:

>On February 07, 2003 at 03:10:46, Matt Taylor wrote:
>
>>There is another subtle difference, too; IA-64 is heavily optimized in software
>>whereas IA-32 is heavily optimized in hardware. In IA-64 it is possible to
>>achieve rates closer to the theoretical 6 instructions per clock than it is on
>>IA-32.
>
>Possibly only because it runs at a much lower clock speed.

Um, possibly because that is the philosophy in VLIW chip design...

I stick a bunch of execution units (carefully picked, of course) in my CPU, just
as I would if I were building the next Pentium. The difference is that I don't
waste a lot of transistors on reordering and such to get more parallelism; I
just let the compiler optimize for my specific mix.

IA-64 comes much closer to theoretical speed because of things like predication
and its loop counter. (Plus it uses a register stack like Sparc.)

>>The IA-64 is probably extremely nice to compute with (6 MB L2 cache!!) if you
>
>Sort of. A 3GHz P4 outscores a 900MHz McKinely by 67% at SPECint2k, which is
>what's important for computer chess. McKinley is good at SPECfp2k. Maybe that's
>what you're referring to.

No, actually. I have never used a McKinley; I've only seen it on paper. Still,
the P4 3.06 GHz has 512K of L2 cache, and the McKinley has 3 or 6 MB. Now I
can't remember whether 6 MB is Itanium-III or McKinley.

I have my doubts regarding such numbers; others in this thread have already
expressed empirical data showing Itanium is faster than the P4 in chess.

>>Athlon64 will support all of these instructions. Yes, it is a waste when
>>significant portions of the CPU core are dedicated to MMX/SSE and no compiler
>>can generate MMX/SSE code, but an astute assembly programmer can write code for
>
>The Intel compiler can generate SSE2 (instead of x87) for floating point
>calculations. I believe gcc has library functions that make use of MMX.

This is not the same as saying "the compiler can vectorize code." I can
hand-tweak my routines all day and then show you how I've made terrific use of
MMX and SSE. I have to use intrinsics or inline assembly in order to do so. The
compiler will -not- generate MMX/SSE instructions to vectorize my code. It will
not do 64-bit computations in MMX registers. If the Intel compiler will use
scalar SSE, good for them, but none of the other major compilers generate -any-
MMX or SSE instruction for C code. I would be suprised if any C compiler does;
Intel doesn't go further than scalar FP code.

Using SSE to do scalar floating-point calculations isn't a real big thing,
either. It is marginally faster (eliminates some overhead), but how does that
benefit chess?

>I wouldn't say MMX or SSE uses significant portions of the CPU core, relatively
>speaking. The difference between a Pentium and a Pentium MMX is ~1M transistors,
>and probably most of those were devoted to doubling the L1 cache sizes, not to
>MMX functionality. The difference between the Pentium 2 and the Pentium 3 (with
>SSE) is ~2M transistors. I guess you can decide for yourself if these numbers
>are significant.

MMX alone eats more than 10% of an older Athlon die -- about 4M transistors on a
42M transistor chip. 10% is pretty significant.

-Matt



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.