Author: Matthew Hull
Date: 10:20:58 02/03/05
Go up one level in this thread
On February 03, 2005 at 11:59:11, Robert Hyatt wrote: >On February 03, 2005 at 11:17:31, Vincent Diepeveen wrote: > >>On February 02, 2005 at 11:54:46, Robert Hyatt wrote: >> >>>On February 02, 2005 at 09:53:03, Vincent Diepeveen wrote: >>> >>>>On February 01, 2005 at 21:51:19, Robert Hyatt wrote: >>>> >>>>>On February 01, 2005 at 17:19:26, Vincent Diepeveen wrote: >>>>> >>>>>>On February 01, 2005 at 16:28:22, Robert Hyatt wrote: >>>>>> >>>>>>Still didn't read the subject title? >>>>>> >>>>>>[snip] >>>>>> >>>>>>>Because a cluster can't offer 1/100th the total memory bandwidth of a big Cray >>>>>>>vector box. >>>>>> >>>>>>Actually todays clusters deliver a factor 1000 more or so. >>>>>> >>>>>>Total bandwidth a cluster can deliver is measured nowadays in Terabytes per >>>>>>second, with Cray it was measured in gigabytes per second. >>>>> >>>>>Let's see. The last Cray I ran on with a chess program was a T932. Processor >>>>>could read 4 words and write two words per cycle, cycle time was 2ns. So 6 >>>>>words, 48 bytes per cycle, x 500M cycles per second is about 2.5 gigabytes per >>>>>second, x 32 processors is getting dangerously close to 100 gigabytes per >>>> >>>>Bandwidth a cpu at the old MIPS was 3.2 GB/s from memory (origin3000 series) >>>>and bandwidth at altix3000 using network4 a cpu is 8.2 gigabyte per second from >>>>memory. >>>> >>>>So what Cray streamed there was impressive for its days, but it delivered to >>>>just a few cpu's, that was the entire main problem. This for massive power >>>>consumption. >>>> >>>>What we speak of now is that you get effectively the same bandwidth from memory >>>>to each cpu now, but systems go up to 130000+ processors. >>>> >>>>>second. A "cluster" can have more theoretical bandwidth, but rarely as much >>>>>_real_ bandwidth. This is on a shared memory machine that can do real codes >>>>>quite well. >>>> >>>>>> >>>>>>Note it's the same network that gets used for huge Cray T3E's, but a newer and >>>>>>bigger version, that's all. >>>>> >>>>>T3E isn't a vector computer. >>>> >>>>The processor used (alpha) was out of order, yet achieves the same main >>>>objective, that's executing more than 1 instruction a cycle effectively. >>>> >>>>Itanium2 is objectively seen is a vector processor as it executes 2 bundles at >>>>once. Though they call that IPF nowadays. >>> >>>That's not a vector architecture. A vector machine executes _one_ instruction >>>and produces a large number of results. For example, current cray vector boxes >>>can produce 128 results by executing a single instruction. That is why MIPS was >>>dropped and FLOPS became the measure for high-performance computing. >> >>IPF executes 2 bundles per cycle. >> >>1 bundle = 3 instructions in IPF >> >>You can see that as a vector. > >Maybe _you_ can see that as a vector. No person familiar with computer >architecture sees that as a vector. No architecture textbook calls that a >vector. I stick to common definitions of words, not your privately twisted >definitions that nobody can communicate with. > >Pick up a copy of Hennessy/Patterson's architecture book and look up "vector >operations" in the index. VLIW is _not_ vector. "bundles" are _not_ vector. > >> >>> >>>> >>>>All x86-64 which are taking over now are doing 3 instructions a cycle now and >>>>deliver up to 2 flops a cycle. >>>> >>>>>>Crays had usually when in vector like what was is 4 cpu's or so? Sometimes up to >>>>>>128. Above that it was T3E which had alpha's. >>>>>> >>>>>>that one used quadrics usually :) >>>>>> >>>>>>However look to France now. New great supercomputer. 8192 processors or so. >>>>>>Say 2048 nodes. You're looking at 3.6 TB per second bandwidth :) >>>> >>>>>For a synthetic benchmark, not a real code, that's the problem with clusters so >>>>>far... >>>> >>>>It's the speed the memory delivers to the cpu's. >>>> >>>>Nothing synthetic. >>> >>>Didn't think you would understand that. It is about "theoretical peak" vs >>>"sustained peak for real applications". The numbers are not the same. >> >>I know you will act innocent here. But you just have no idea of course. >> >>A 'cheapo' highend network card can get 800Mb/s. > >Not if two machines try to talk to the same node it can't. Not if there is >congestion in the router it can't. Not if there are multiple router hops >between the two points it can't. Etc... > >> >>>> >>>>>>Those Crays you remember were 100Mhz ones. Network could deliver of course >>>>>>exactly what cpu could calculate. >>>>> >>>>>There was no "network" and the crays were 500mhz although on a fully pipelined >>>>>vector machine that can do 5-10 operations per cycle that is not exactly a good >>>>>measure of performance. >>>> >>>>Operations doesn't count. Flops do. >>>> >>>>There is actually 1Ghz Cray here with 256KB cache. >>> >>>DOn't know what cray you have there, but the cache is not normal cache. It is >>>only for "scalar" operations. Vector operations don't go through the scalar >>>cache on a Cray, because all memory reads and writes are pipelined and deliver >>>values every clock cycle after the latency delay for the first word. >> >>www.cray.com > >What should I look for? I have the manuals for every vector machine they have >ever produced/shipped in my office. > > >> >>>> >>>>>>Not so great if you look to the total number of Gflop it delivered. Nowadays the >>>>>>big clusters, as all big supercomputers nowadays are clusters, are measured in >>>>>>Tflop and one already in Pflop :) >>>>>> >>>>>>There is a 0.36 Pflop one now under construction :) >>>>>> >>>>>>Vincent >>>> >>>>>Different computer for different applications. Ask a real programmer which he >>>>>would rather write code for... >>>> >>>>They'll all pick the fastest machine, which is the one delivering most flops. >>>> >>>>Vincent >>> >>>No. Otherwise there would be no machines like the Cray, Fujitsu, Hitachi, etc >>>left. Using a large number of processors in a message-passing architecture is >>>not as easy as using a small number of faster processors in a shared-memory >>>architecture, for many applications. >> >>You confuse network cards with network cards. There is also network cards that >>allow DSM. See www.quadrics.com for details. They have RAM on chip. No need for >>message passing. You can read straight over the network remote from that RAM >>without needing a message in the remote cpu to be handled. > >Please get into the world of standard definitions. Does not matter whether the >remote CPU has to see the "message" or not. It is still "message passing". The >VIA architecture used by our cLAN stuff supports that type of shared memory, but >the two cards still send messages over the network router. Shared memory >systems don't send "messages". > > > >> >>Note all those highends are in serious financial trouble now. >>Probably several bankrupt soon. SGI right now can keep alive a bit thanks to >>intel giving them itanium2 cpu's for near free. Cray already has major problems. >>They are pretty expensive anyway. 50k dollar for 1 node is what i call >>expensive. 1 node has 12 cpu's (opterons). >> >>Just like about everything above 4 cpu's will die. (Highend) network cards take >>over. >> >>Programmers simply are cheaper than hardware, they will have to adapt to >>networks. >> >>Vincent > >That last statement is so far beyond false it takes sunlight six years from the >time it reaches "false" until it reaches that statement. _any_ good CS book >today _always_ contains the quote "In the 60's, the cost of developing software >was _far_ exceeded by the cost of the hardware it was run on, so efficiency of >the programming itself was paramount. Today the cost of developing the software >far exceeds the cost of the system it runs on, so controling the development >cost is what software engineering is all about." > > >That was not just a wrong statement, it was a _grossly_ wrong statement. > >Pick up any good software engineering / software development text book, and >learn something. > > >Who out there besides Vincent thinks hardware costs exceed software development >costs in today's computing world??? Indeed. The biggest line item in a corporation's cost structure is the payroll. Programmers are the first thing to go in cost reduction efforts, because the numbers are so big.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.