Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Memory benchmark comparison DDR333 vs RDRAM PC1066 !

Author: Matt Taylor

Date: 20:52:02 12/02/02

Go up one level in this thread


<snip> (let's keep it relvant)
>>You say, "I had some crafty benchmark data from that machine as well as the
>>recent 2800+ AMD and I made the decision based on performance and nothing
>>else...". The only way the 2800+ is going to be any slower is if the binary it
>>ran had the 1.4x speedup problem. Email your source and have them test with one
>>of my binaries. You could also take the dual 2800+ results and multiply them by
>>1.214 (21.4% diff between 1.7 and 1.4).
>>
>>Also you say the + numbers are optimistic.. How so? My 1900+ at 1.8GHz (2200+)
>>is on par with a P4-3GHz in Crafty. A 2600+ (Stock) could beat the P4-3.25GHz in
>>the results list I have AND have room to spare.
>
>
>I don't say that.  AMD said that.  And they are scaling their numbers back.
>They
>were using a raw estimate that every 66mhz increase in clock speed gave them
>100mhz
>of effective speed increase compared to the PIV.  They discovered this failed at
>the
>beyond 2000 mhz clock frequencies and was "optimistic".  Again, I assume they
>know
>what they are doing since they make the things...

The calculation that AMD originally published was that the performance rating of
a palomino core CPU was something like rating = MHz * 3/2 - 500. According to
this, the AthlonXP 2700 should be 2.13 GHz. According to the data, AthlonXP on a
133 MHz bus is 2.13 GHz. The only change I see is in the AthlonXP 2700 on the
166 MHz bus, and it ends up being clocked a tad lower than the 133 MHz version.
I don't see any information on their website concerning a change in their rating
system for the 133 MHz chips, though. Where did you hear this?

Also, the AMD official documents stated explicitly that they are NOT a
comparison rating with the Pentium 4. Intel has agreed to a rating system. Until
they finalize a system, AMD has resolved to their current performance rating
which compared new Palomino/Thoroughbred/Barton Athlons to the older Thunderbird
Athlons. Each of my AthlonMP 1600 chips is supposedly 33% faster than my
Thunderbird 1.2 GHz (1600 / 1200 = 1.33).

>>By the way, AMD is currently working on Quad Opteron systems and there are test
>>boards available. I don't know if you remember what I said in a previous message
>>(not in this thread) but Opterons are going to have dual-channel DDR and memory
>>banks and that dedicated bandwidth PER CPU.
>>So if you have four CPU's and you're running 200MHz(400DDR) memory w/ dual
>>channels thats 6.4gb/s per cpu. Total you'd have 25.6gb/s bandwidth. Oh, don't
>>forget the CPU has it's own memory controller.
>
>
>Right... and how are you going to read data out of memory where each bank has
>100+ns
>latency?  I know how supercomputers do it.  I don't think you are going to see
>4-port memory
>banks in a machine at that price point...
>
>Theoretical max is a nice concept.  Attainable thruput is more interesting.  At
>present, Intel
>seems to be leading that list by a significant margin.  Whether they will in the
>future or not
>I don't know, as I try to not evaluate "vaporware".  But for the moment, they
>are on top in the
>bandwidth war...  as Linpack and other normal memory-intensive applications show
>time after
>time.
>
>I hope they can deliver a quad opteron for a resonable price.  They were talking
>about quad
>K7's two years ago and not a single instance has shown up yet.  Intel talked
>about the 8-way
>boxes a while back and delivered a kludge there, using a "fusion" chipset to tie
>two 4-way
>clusters of processors together into a single 8-way box, but with terrible
>memory performance.
>They tried to offset that by only offering 2M L2 caches, but that drove the
>price up and didn't
>help memory-bound large applications at all...  I hope the quad opterons don't
>end up in
>never-never land as the 8-way boxes did...

You are probably referring to the ccNUMA machines built off of quad-cpu nodes,
and they have scaled higher than that. The largest I've seen has 1,024 CPUs (256
nodes).

Aaron is right about the Opterons -- AMD's 64-bit offerings include a memory
controller on-die for lower latency. The side-effect is that each Sledgehammer
(AMD officially calls the SMP version Sledgehammer) has a dedicated 2.7 GB/sec
memory controller, and memory bandwidth scales as you add more CPUs. (The other
side-effect is that poorly designed operating systems will end up scheduling
threads on non-ideal processors. However, I hear that AMD and Microsoft are
working together for the first time.)

Assuming that Crafty gains nothing from the 64-bit word size, it will at least
benefit as all recompiled applications will from the fact that the machine has
16 general registers instead of the meager 8 available to 32-bit IA-32 code.

It is also noteworthy that the timings on AthlonXP are much tighter than on P4.
I have a benchmark that demonstrates not only the enormously deep P4 pipeline
but also that in individual instructions, including the SSE instructions that
Intel crafted, the AthlonXP executes a higher overall ipc. Source is available
for the benchmark, though it is Windows code.



This page took 0.08 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.