Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 64-bit machines

Author: Robert Hyatt

Date: 19:47:01 02/09/03

Go up one level in this thread


On February 09, 2003 at 03:21:45, Tom Kerrigan wrote:

>On February 09, 2003 at 00:14:46, Matt Taylor wrote:
>
>>On February 07, 2003 at 08:09:23, Tom Kerrigan wrote:
>>
>>>On February 07, 2003 at 03:10:46, Matt Taylor wrote:
>>>
>>>>There is another subtle difference, too; IA-64 is heavily optimized in software
>>>>whereas IA-32 is heavily optimized in hardware. In IA-64 it is possible to
>>>>achieve rates closer to the theoretical 6 instructions per clock than it is on
>>>>IA-32.
>>>
>>>Possibly only because it runs at a much lower clock speed.
>>
>>Um, possibly because that is the philosophy in VLIW chip design...
>>
>>I stick a bunch of execution units (carefully picked, of course) in my CPU, just
>>as I would if I were building the next Pentium. The difference is that I don't
>>waste a lot of transistors on reordering and such to get more parallelism; I
>>just let the compiler optimize for my specific mix.
>>
>>IA-64 comes much closer to theoretical speed because of things like predication
>>and its loop counter. (Plus it uses a register stack like Sparc.)
>
>You're assuming that software scheduling does a better job than hardware
>scheduling but you have no data to back up that assumption. Prefetching and
>predication are very poor substitutes for out-of-order execution. They make
>writing software (or at least compilers) more difficult and they often waste
>valuable memory bandwidth and execution units.
>
>As for the SPARC register stack, it's widely accepted that it doesn't
>significantly improve performance and it makes the register file big enough to
>hurt clock speed (which is one of the main reasons why IA-64 chips are clocked
>so slow). It all but prevents register file duplication or caching, like in
>Alphas...

The claim to fame for the sparc approach is simply "fast procedure calls".
No register saving or restoring.  It was a necessary trade-off since the
first sparcs didn't have hardware integer multiply/divide which made procedure
calls very frequent.



>
>>No, actually. I have never used a McKinley; I've only seen it on paper. Still,
>>the P4 3.06 GHz has 512K of L2 cache, and the McKinley has 3 or 6 MB. Now I
>>can't remember whether 6 MB is Itanium-III or McKinley.
>
>Doesn't matter for computer chess. Every program I know about (with the
>exception of HIARCS) has a working set of < 256k.

I have one that doesn't fit your working set limit...

IE my attack lookup tables are 8 byte arrays of size [64][256] which turns
into 128K bytes each if my math is right.  I can point out four of those that
are used _everywhere_ and that is only a start.  I'd suspect my "working set"
probably comes closer to 1-2MB for the engine alone...

>
>>>>significant portions of the CPU core are dedicated to MMX/SSE and no compiler
>>>>can generate MMX/SSE code, but an astute assembly programmer can write code >>The Intel compiler can generate SSE2 (instead of x87) for floating point
>>>calculations. I believe gcc has library functions that make use of MMX.
>>This is not the same as saying "the compiler can vectorize code." I can
>
>Right. You said generate MMX/SSE code, not vectorize code.
>
>>MMX alone eats more than 10% of an older Athlon die -- about 4M transistors on a
>>42M transistor chip. 10% is pretty significant.
>
>Where did you get that number?
>
>-Tom



This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.