Author: Robert Hyatt
Date: 11:02:56 02/03/05
Go up one level in this thread
On February 03, 2005 at 13:20:58, Matthew Hull wrote: >On February 03, 2005 at 11:59:11, Robert Hyatt wrote: > >>On February 03, 2005 at 11:17:31, Vincent Diepeveen wrote: >> >>>On February 02, 2005 at 11:54:46, Robert Hyatt wrote: >>> >>>>On February 02, 2005 at 09:53:03, Vincent Diepeveen wrote: >>>> >>>>>On February 01, 2005 at 21:51:19, Robert Hyatt wrote: >>>>> >>>>>>On February 01, 2005 at 17:19:26, Vincent Diepeveen wrote: >>>>>> >>>>>>>On February 01, 2005 at 16:28:22, Robert Hyatt wrote: >>>>>>> >>>>>>>Still didn't read the subject title? >>>>>>> >>>>>>>[snip] >>>>>>> >>>>>>>>Because a cluster can't offer 1/100th the total memory bandwidth of a big Cray >>>>>>>>vector box. >>>>>>> >>>>>>>Actually todays clusters deliver a factor 1000 more or so. >>>>>>> >>>>>>>Total bandwidth a cluster can deliver is measured nowadays in Terabytes per >>>>>>>second, with Cray it was measured in gigabytes per second. >>>>>> >>>>>>Let's see. The last Cray I ran on with a chess program was a T932. Processor >>>>>>could read 4 words and write two words per cycle, cycle time was 2ns. So 6 >>>>>>words, 48 bytes per cycle, x 500M cycles per second is about 2.5 gigabytes per >>>>>>second, x 32 processors is getting dangerously close to 100 gigabytes per >>>>> >>>>>Bandwidth a cpu at the old MIPS was 3.2 GB/s from memory (origin3000 series) >>>>>and bandwidth at altix3000 using network4 a cpu is 8.2 gigabyte per second from >>>>>memory. >>>>> >>>>>So what Cray streamed there was impressive for its days, but it delivered to >>>>>just a few cpu's, that was the entire main problem. This for massive power >>>>>consumption. >>>>> >>>>>What we speak of now is that you get effectively the same bandwidth from memory >>>>>to each cpu now, but systems go up to 130000+ processors. >>>>> >>>>>>second. A "cluster" can have more theoretical bandwidth, but rarely as much >>>>>>_real_ bandwidth. This is on a shared memory machine that can do real codes >>>>>>quite well. >>>>> >>>>>>> >>>>>>>Note it's the same network that gets used for huge Cray T3E's, but a newer and >>>>>>>bigger version, that's all. >>>>>> >>>>>>T3E isn't a vector computer. >>>>> >>>>>The processor used (alpha) was out of order, yet achieves the same main >>>>>objective, that's executing more than 1 instruction a cycle effectively. >>>>> >>>>>Itanium2 is objectively seen is a vector processor as it executes 2 bundles at >>>>>once. Though they call that IPF nowadays. >>>> >>>>That's not a vector architecture. A vector machine executes _one_ instruction >>>>and produces a large number of results. For example, current cray vector boxes >>>>can produce 128 results by executing a single instruction. That is why MIPS was >>>>dropped and FLOPS became the measure for high-performance computing. >>> >>>IPF executes 2 bundles per cycle. >>> >>>1 bundle = 3 instructions in IPF >>> >>>You can see that as a vector. >> >>Maybe _you_ can see that as a vector. No person familiar with computer >>architecture sees that as a vector. No architecture textbook calls that a >>vector. I stick to common definitions of words, not your privately twisted >>definitions that nobody can communicate with. >> >>Pick up a copy of Hennessy/Patterson's architecture book and look up "vector >>operations" in the index. VLIW is _not_ vector. "bundles" are _not_ vector. >> >>> >>>> >>>>> >>>>>All x86-64 which are taking over now are doing 3 instructions a cycle now and >>>>>deliver up to 2 flops a cycle. >>>>> >>>>>>>Crays had usually when in vector like what was is 4 cpu's or so? Sometimes up to >>>>>>>128. Above that it was T3E which had alpha's. >>>>>>> >>>>>>>that one used quadrics usually :) >>>>>>> >>>>>>>However look to France now. New great supercomputer. 8192 processors or so. >>>>>>>Say 2048 nodes. You're looking at 3.6 TB per second bandwidth :) >>>>> >>>>>>For a synthetic benchmark, not a real code, that's the problem with clusters so >>>>>>far... >>>>> >>>>>It's the speed the memory delivers to the cpu's. >>>>> >>>>>Nothing synthetic. >>>> >>>>Didn't think you would understand that. It is about "theoretical peak" vs >>>>"sustained peak for real applications". The numbers are not the same. >>> >>>I know you will act innocent here. But you just have no idea of course. >>> >>>A 'cheapo' highend network card can get 800Mb/s. >> >>Not if two machines try to talk to the same node it can't. Not if there is >>congestion in the router it can't. Not if there are multiple router hops >>between the two points it can't. Etc... >> >>> >>>>> >>>>>>>Those Crays you remember were 100Mhz ones. Network could deliver of course >>>>>>>exactly what cpu could calculate. >>>>>> >>>>>>There was no "network" and the crays were 500mhz although on a fully pipelined >>>>>>vector machine that can do 5-10 operations per cycle that is not exactly a good >>>>>>measure of performance. >>>>> >>>>>Operations doesn't count. Flops do. >>>>> >>>>>There is actually 1Ghz Cray here with 256KB cache. >>>> >>>>DOn't know what cray you have there, but the cache is not normal cache. It is >>>>only for "scalar" operations. Vector operations don't go through the scalar >>>>cache on a Cray, because all memory reads and writes are pipelined and deliver >>>>values every clock cycle after the latency delay for the first word. >>> >>>www.cray.com >> >>What should I look for? I have the manuals for every vector machine they have >>ever produced/shipped in my office. >> >> >>> >>>>> >>>>>>>Not so great if you look to the total number of Gflop it delivered. Nowadays the >>>>>>>big clusters, as all big supercomputers nowadays are clusters, are measured in >>>>>>>Tflop and one already in Pflop :) >>>>>>> >>>>>>>There is a 0.36 Pflop one now under construction :) >>>>>>> >>>>>>>Vincent >>>>> >>>>>>Different computer for different applications. Ask a real programmer which he >>>>>>would rather write code for... >>>>> >>>>>They'll all pick the fastest machine, which is the one delivering most flops. >>>>> >>>>>Vincent >>>> >>>>No. Otherwise there would be no machines like the Cray, Fujitsu, Hitachi, etc >>>>left. Using a large number of processors in a message-passing architecture is >>>>not as easy as using a small number of faster processors in a shared-memory >>>>architecture, for many applications. >>> >>>You confuse network cards with network cards. There is also network cards that >>>allow DSM. See www.quadrics.com for details. They have RAM on chip. No need for >>>message passing. You can read straight over the network remote from that RAM >>>without needing a message in the remote cpu to be handled. >> >>Please get into the world of standard definitions. Does not matter whether the >>remote CPU has to see the "message" or not. It is still "message passing". The >>VIA architecture used by our cLAN stuff supports that type of shared memory, but >>the two cards still send messages over the network router. Shared memory >>systems don't send "messages". >> >> >> >>> >>>Note all those highends are in serious financial trouble now. >>>Probably several bankrupt soon. SGI right now can keep alive a bit thanks to >>>intel giving them itanium2 cpu's for near free. Cray already has major problems. >>>They are pretty expensive anyway. 50k dollar for 1 node is what i call >>>expensive. 1 node has 12 cpu's (opterons). >>> >>>Just like about everything above 4 cpu's will die. (Highend) network cards take >>>over. >>> >>>Programmers simply are cheaper than hardware, they will have to adapt to >>>networks. >>> >>>Vincent >> >>That last statement is so far beyond false it takes sunlight six years from the >>time it reaches "false" until it reaches that statement. _any_ good CS book >>today _always_ contains the quote "In the 60's, the cost of developing software >>was _far_ exceeded by the cost of the hardware it was run on, so efficiency of >>the programming itself was paramount. Today the cost of developing the software >>far exceeds the cost of the system it runs on, so controling the development >>cost is what software engineering is all about." >> >> >>That was not just a wrong statement, it was a _grossly_ wrong statement. >> >>Pick up any good software engineering / software development text book, and >>learn something. >> >> >>Who out there besides Vincent thinks hardware costs exceed software development >>costs in today's computing world??? > > >Indeed. The biggest line item in a corporation's cost structure is the payroll. > Programmers are the first thing to go in cost reduction efforts, because the >numbers are so big. It was a completely idiot statement that 99.99999% of the world knows is wrong. Too many different CS books can be found to show it is wrong...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.