Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Memory benchmark comparison DDR333 vs RDRAM PC1066 !

Author: Matt Taylor

Date: 15:01:48 12/02/02

>>>I hope they can deliver a quad opteron for a resonable price.  They were talking
>>>about quad
>>>K7's two years ago and not a single instance has shown up yet.  Intel talked
>>>about the 8-way
>>>boxes a while back and delivered a kludge there, using a "fusion" chipset to tie
>>>two 4-way
>>>clusters of processors together into a single 8-way box, but with terrible
>>>memory performance.
>>>They tried to offset that by only offering 2M L2 caches, but that drove the
>>>price up and didn't
>>>help memory-bound large applications at all...  I hope the quad opterons don't
>>>end up in
>>>never-never land as the 8-way boxes did..
>>
>>Here's a picture of a Quad opteron system if for some reason you think it's
>>never going to happen...
>>http://www.amdzone.com/articleimages/cpu/hammer/4popt.JPG
>>There are many Dual Opterons out as well..
>
>They had "pictures" of quad K7 MBs as well.  Never saw one on the street,
>however.
>Again, I don't see how to evaluate what's gonna be.  Just what is that we can
>get our hands
>on...

Actually Opteron scales to 8 processors and should scale up further. Opteron is
a ccNUMA architecture unlike any current x86 systems. (If you look around, you
-CAN- find ccNUMA systems built from dual-Xeon and quad-Xeon nodes.)

>>>If I recall, the 4=way dual 2.0ghz xeon is the fastest PC-class machine around
>>>right now,
>>>by a wide margin.  And the heavier the load placed on it, the wider that gap
>>>becomes...

That depends on what you're doing, who you ask, and how you optimize. One can
summarize by saying that Athlon will run existing code -much- faster than a P4.
Hand-tweaked programs can run faster on P4, and memory-intensive algorithms
employing SSE will run faster on P4.

On a P4, code can be more lax with stack usage because the P4 has a lower L1
latency (2 clocks) than Athlon (3 clocks). On Athlon, a shift (1 clock) is less
costly than on a P4 (4-6 clocks). Keep in mind that using the AGU uses the
shifter on the P4 because the AGU circuit was removed, too. Many other
differences exist, but there are many well-written articles documenting them,
though I don't think I've seen any on Tom's Hardware.

A careful observer will also note that code that runs efficiently on a P4 will
likely make inefficient use of the Athlon and vice versa. It's unfortunate that
benchmarks do not include source. Is this the principle of cold fusion research?

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.