Author: Robert Hyatt
Date: 20:48:52 12/06/02
Go up one level in this thread
On December 06, 2002 at 16:44:43, Matt Taylor wrote: >On December 06, 2002 at 07:32:57, Vincent Diepeveen wrote: > >>On December 05, 2002 at 01:14:18, Jeremiah Penery wrote: >> >>>On December 04, 2002 at 23:23:32, Robert Hyatt wrote: >>> >>>>>Current AthlonMP chipsets also have a seperate bus per CPU. They use the same >>>>>EV6 bus as Alpha processors did (or still do?). The memory modules shared, >>>>>whereas Hammer will have separate memory modules for each processor. >>>> >>>> >>>>The problem with that is it turns into a NUMA architecture which has its _own_ >>>>set of problems. One cpu connected to one memory module means that the other >>>>CPU can't get to it as efficiently... >>>> >>>>IE this doesn't offer one tiny bit of improvement over a SMP-type machine with >>>>shared memory... Unless the algorithm is specifically designed to attempt to >>>>lccalize memory references and duplicate data that is needed by both threads >>>>often... >>>> >>>>This might be an improvement for running two programs at once. For one >>>>program using two processors, NUMA offers additional challenges for the >>>>parallel programmer... >>> >>>According to all documentation, which I have no reason to doubt, a non-local >>>memory access in a Hammer system is just as fast as a memory access in a >>>processor/chipset combination where the memory controller resides in the >>>northbridge (i.e. all other x86 configurations). Local memory accesses are >>>quite a lot faster. Therefore, the average case, even in 8-way machines that >>>take up to 3 hops for a memory access, is still below that of any x86 machine of >>>today. >> >>If you read the documentation as it is you get confronted with >>theoretical data which doesn't take into account any part of >>the configuration which is worst case. >> >>Bob is more near the truth here than you might want to guess, because >>as soon as you go run on those supercomputers with theoretic performance >>of a certain peak and you go test yourself then the practical peak >>is up to 50 times slower than the theoretic data suggests. >> >>So on paper this is way faster and even works up to 8 cpu's (which is >>unlikely we ever will see working), as good propagandists those papers >>are not going to tell you weak spots in the design which prevent >>that *theoretic* performance from happening in reality. >> >>In case they get this dual CPU to work we will see what its speed is. >> >>For now i assume it's a cluster like Bob does. >> >>Note that it's nearly impossible to get to work a 8 cpu machine with >>that architecture. Imagine how complex design of it will be. >> >>Which OS will work on that? >> >>Best regards, >>Vincent > >First of all, this is a crossbar. Other crossbar systems have scaled up to 64 >nodes or so I've heard. Crossbar performance is -much- better than your typical >NUMA system. Economic crossbar systems take the same approach AMD is taking: >each node adds a crossbar to the system. I don't think that's coincidence. Yes, but check the math. Cray scaled their T90 up to 32 procssors. over 1/2 the _total_ cost of the machine was in the memory interconnections (crossbar). But someone is wrong somewhere, and I don't claim to be an AMD expert. However if a processor has a dedicated path to memory there is no crossbar. If it has a crossbar path to memory there is no dedicated path. How would you do _both_ without it smelling like NUMA? > >Second of all, any OS that supports SMP on shared-bus will support Opteron. All >of the cache coherency and switching is done in hardware. Optimization can be >made by recognition that this is a NUMA architecture. However, I think the MP >1.4 spec which has been available for a couple years allows the specification of >NUMA configurations. Linux64 and Windows XP 64-bit are old, old announcements. > >Third of all, AMD demonstrated a quad-Opteron system at Computex Taipei 2002 (in >addition to a fair number of other shows). The biggest hurdle in 8-way Opterons >is finding PCB real-estate. I don't think AMD would be promising such systems if >they were uncertain whether or not they could deliver, even if other companies >(Rambus) do. > >Right now performance of Opteron systems is admittedly bad, but the performance >of most prototypes are. As for the chips themselves, an 800 MHz Clawhammer >prototype was reportedly faster than the 1.6 GHz Williamette in 32-bit code. > >-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.