Author: Matt Taylor
Date: 20:52:02 12/02/02
Go up one level in this thread
<snip> (let's keep it relvant) >>You say, "I had some crafty benchmark data from that machine as well as the >>recent 2800+ AMD and I made the decision based on performance and nothing >>else...". The only way the 2800+ is going to be any slower is if the binary it >>ran had the 1.4x speedup problem. Email your source and have them test with one >>of my binaries. You could also take the dual 2800+ results and multiply them by >>1.214 (21.4% diff between 1.7 and 1.4). >> >>Also you say the + numbers are optimistic.. How so? My 1900+ at 1.8GHz (2200+) >>is on par with a P4-3GHz in Crafty. A 2600+ (Stock) could beat the P4-3.25GHz in >>the results list I have AND have room to spare. > > >I don't say that. AMD said that. And they are scaling their numbers back. >They >were using a raw estimate that every 66mhz increase in clock speed gave them >100mhz >of effective speed increase compared to the PIV. They discovered this failed at >the >beyond 2000 mhz clock frequencies and was "optimistic". Again, I assume they >know >what they are doing since they make the things... The calculation that AMD originally published was that the performance rating of a palomino core CPU was something like rating = MHz * 3/2 - 500. According to this, the AthlonXP 2700 should be 2.13 GHz. According to the data, AthlonXP on a 133 MHz bus is 2.13 GHz. The only change I see is in the AthlonXP 2700 on the 166 MHz bus, and it ends up being clocked a tad lower than the 133 MHz version. I don't see any information on their website concerning a change in their rating system for the 133 MHz chips, though. Where did you hear this? Also, the AMD official documents stated explicitly that they are NOT a comparison rating with the Pentium 4. Intel has agreed to a rating system. Until they finalize a system, AMD has resolved to their current performance rating which compared new Palomino/Thoroughbred/Barton Athlons to the older Thunderbird Athlons. Each of my AthlonMP 1600 chips is supposedly 33% faster than my Thunderbird 1.2 GHz (1600 / 1200 = 1.33). >>By the way, AMD is currently working on Quad Opteron systems and there are test >>boards available. I don't know if you remember what I said in a previous message >>(not in this thread) but Opterons are going to have dual-channel DDR and memory >>banks and that dedicated bandwidth PER CPU. >>So if you have four CPU's and you're running 200MHz(400DDR) memory w/ dual >>channels thats 6.4gb/s per cpu. Total you'd have 25.6gb/s bandwidth. Oh, don't >>forget the CPU has it's own memory controller. > > >Right... and how are you going to read data out of memory where each bank has >100+ns >latency? I know how supercomputers do it. I don't think you are going to see >4-port memory >banks in a machine at that price point... > >Theoretical max is a nice concept. Attainable thruput is more interesting. At >present, Intel >seems to be leading that list by a significant margin. Whether they will in the >future or not >I don't know, as I try to not evaluate "vaporware". But for the moment, they >are on top in the >bandwidth war... as Linpack and other normal memory-intensive applications show >time after >time. > >I hope they can deliver a quad opteron for a resonable price. They were talking >about quad >K7's two years ago and not a single instance has shown up yet. Intel talked >about the 8-way >boxes a while back and delivered a kludge there, using a "fusion" chipset to tie >two 4-way >clusters of processors together into a single 8-way box, but with terrible >memory performance. >They tried to offset that by only offering 2M L2 caches, but that drove the >price up and didn't >help memory-bound large applications at all... I hope the quad opterons don't >end up in >never-never land as the 8-way boxes did... You are probably referring to the ccNUMA machines built off of quad-cpu nodes, and they have scaled higher than that. The largest I've seen has 1,024 CPUs (256 nodes). Aaron is right about the Opterons -- AMD's 64-bit offerings include a memory controller on-die for lower latency. The side-effect is that each Sledgehammer (AMD officially calls the SMP version Sledgehammer) has a dedicated 2.7 GB/sec memory controller, and memory bandwidth scales as you add more CPUs. (The other side-effect is that poorly designed operating systems will end up scheduling threads on non-ideal processors. However, I hear that AMD and Microsoft are working together for the first time.) Assuming that Crafty gains nothing from the 64-bit word size, it will at least benefit as all recompiled applications will from the fact that the machine has 16 general registers instead of the meager 8 available to 32-bit IA-32 code. It is also noteworthy that the timings on AthlonXP are much tighter than on P4. I have a benchmark that demonstrates not only the enormously deep P4 pipeline but also that in individual instructions, including the SSE instructions that Intel crafted, the AthlonXP executes a higher overall ipc. Source is available for the benchmark, though it is Windows code.
This page took 0.08 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.