Author: Robert Hyatt
Date: 10:22:10 10/15/03
Go up one level in this thread
On October 15, 2003 at 12:21:17, Vincent Diepeveen wrote: >On October 15, 2003 at 10:41:37, Robert Hyatt wrote: > >Bob, > >go to www.cray.com and see how many instructions this thing can put through a >clock :) > >Nowadays they are 1Ghz and have a 256KB cache too. First, we are talking about the C90. It had no cache. It executed one instruction per clock. As did/does the T90. I have no idea what you are looking at. I am looking at the Cray Research publication CSM-0500-000. I don't think you are going to find anything on the web site that contradicts that. Your number is ridiculous. The Cray has 8 address registers. 8 scalar registers. 8 vector registers. When an instruction addresses any of those as a destination, that register is unavailable for several clock cycles. No way to issue 19 instructions. With only 8 vector registers, there is no way to issue more than 8 vector instructions, even if it _could_. Your comprehension is failing you. Remember, I'm not guessing as you are, I have actually programmed these things for 20 years. I'll be happy to give you a couple of names of software/hardware folks up there that will set you straight, although I am sure you will argue with them as well. > >But just imagine that they would not be able to do 29 instructions a clock but >just 1 or 2. I don't have to imagine that. I did it for 20 years. The Cray 1 through the T90 issued _one_ instruction per clock cycle. Of course, once a single vector instruction is issued, it executes for up to 64 clock cycles producing a new floating point result every clock, and while this is going on, every cycle yet another instruction can issue. But _never_ more than one issue per cycle. And I do mean _never_. You might have several instructions busy at any one cycle, but it only starts one new instruction every cycle _max_ and often less than that. > >Then any x86 processor blows them away at floating point :) You are stupid beyond belief. I explained vector processing. You are still arguing "instructions per clock". It is hopeless trying to explain this to a brain-dead person that simply refuses to look something up and read it carefully. You are _wrong_. You are almost always _wrong_. And you will _continue_ to be wrong on this subject until you grasp vectors and the difference between "operations" and "instructions". Until then it is hopeless... > >>On October 15, 2003 at 10:01:44, Vincent Diepeveen wrote: >> >>>On October 15, 2003 at 09:32:10, Robert Hyatt wrote: >>> >>>>On October 14, 2003 at 17:29:18, Vincent Diepeveen wrote: >>>> >>>>>On October 14, 2003 at 16:18:28, Robert Hyatt wrote: >>>>> >>>>>>On October 14, 2003 at 14:29:36, Gerd Isenberg wrote: >>>>>> >>>>>>>On October 14, 2003 at 14:15:33, Vincent Diepeveen wrote: >>>>>>> >>>>>>>>On October 14, 2003 at 14:13:08, Gerd Isenberg wrote: >>>>>>>> >>>>>>>>>On October 14, 2003 at 10:07:10, Ricardo Gibert wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>>http://www.wired.com/news/technology/0,1282,60791,00.html >>>>>>>>>> >>>>>>>>>>Can this be productively used in a chess program? >>>>>>>>> >>>>>>>>>I don't know, simular hardware ressources may be more productive for chess, if >>>>>>>>>implemented as hyperthreading devices. I guess it's a kind of further >>>>>>>>>development of SSE and AltiVec technology. With huge register files >>>>>>>>>(N * 64 * 64|128|256-bit?) and probably SIMD-wise integer instructions >>>>>>>>>(including popcount?) and fast memory interface, i can imagine that it is >>>>>>>>>usefull for a lot of nice things, like some eval passes, e.g. a first square >>>>>>>>>wise and a final scalar product pass. And fill-attack generation, e.g. square >>>>>>>>>wise in all 16 directions with a specialiced dumb fill routine. >>>>>>>>> >>>>>>>>>Gerd >>>>>>>> >>>>>>>>this is just floating point arrays. >>>>>>> >>>>>>>Aha, well may be a matter of interpretation. >>>>>>>I havn't seen any instruction set yet. >>>>>>> >>>>>>>On the other hand, if float and double arithmetic becomes as fast (or faster) as >>>>>>>integer, why not use it for eval purposes? >>>>>>> >>>>>>>Gerd >>>>>> >>>>>> >>>>>>Correct. We did this on the Cray. FP was very fast there and it frees >>>>>>up integer registers for addresses and array indices... >>>>> >>>>>That's of course true however at 16 processors of 100Mhz you reached 500k nodes >>>>>a second with cray blitz. >>>>> >>>>>Each Cray processor can issue up to 29 instructions a cycle. >>>> >>>>I have no idea what you are talking about. Each cray processor can issue >>>>_one_ instruction per cycle. >>>> >>>>however, doing vector stuff, in one cycle the machine can do four memory >>>>reads and two memory writes (8 byte words) per processor. It can also do >>>>multiple things in one cycle with vector chaining, but it never issues more >>>>than one instruction per cycle per cpu. >>>> >>>>I don't know what data you are looking at, but it is wrong. >>>> >>>>> >>>>>Crafty at a 1.6Ghz K7 which can issue up to 3 instructions a cycle gets 1 >>>>>million nodes a second. >>>>> >>>>>So something capable of 100M * 16 * 29 = 46.4G instructions a cycle you get 500k >>>>>nps because it is a vector machine >>> >>>Bob cut the crap. >>> >>>If cray would execute 1 instruction a cycle then the processors would be >>>10 times slower than any other solution. >> >>Vincent, wanna make a bet? Any amount of money you care to put on it. >> >>The cray issues one instruction per cycle. Of course, you have _no idea_ >>of what a vector machine does and how it does it, so you aren't going to >>understand anything about the machine. But one instruction per cycle per >>processor is _it_. >> >>You can find this in any good Cray Reference. I'll be happy to xerox a page >>from the C90 hardware reference manual that gives this info. >> >>Next, do you understand the difference between an _instruction_ and an >>_operation_? Didn't think so. The cray has a set of vector instructions >>where _one_ instruction produces multiple results by operating on a vector. >>But it can't _issue_ more than one instruction per cycle. It is possible that >>by issuing multiple consecutive instructions, you "chain" vector functional >>units together and produce multiple _operations_ per cycle. But _not_ >>multiple instructions. >> >>Why don't you try to talk about something you know something about, if there >>is such a topic? And stop trying to talk "cray" to someone that has actually >>_used_ them for 20+ years? >> >>> >>>Yet everyone loves crays because they are vector processors which can do up to >>>29 instructions a cycle. >> >>Nope. >> >>One instruction per cycle. Try this on for size: >> >>Cray Y-MP C90 System Programmer Reference Manual, CSM-0500-000 >> >>"A fetch sequence begins immediately and transfers a block of instructions >>from memory to an instruction buffer. The issue sequence then selects the >>instruction indicated by the program address (P) register, decodes it, >>determines whether the required registers or functional units are available, >>and if so, allows the instruction to be executed. >> >>As the instruction executes, the P register increments, causing a new >>instruction to be selected from the instruction buffer." >> >>The above happens _once_ per processor cycle. >> >>Again, you don't understand what vector processing is all about. >> >>> >>>Even a P5/100 would have been faster than a cray because it can do 2 >>>instructions a clock at 100Mhz. >> >> >>So? How long would it take that P5/100 to execute (say) a floating point >>add? The cray does one in 3 cycles. But if it is a vector instruction, >>afther the first result pops out after 3 cycles, the next result pops out >>one cycle later, and this continues until the vector has been completely >>processed. >> >>Can your P5 do one floating add per cycle? Didn't think so. After you >>issue several floating point vector instructions (here is an example): >> >> v0 v1+v2 >> v3 v4+v5 >> v6 v0*v3 >> >>after three cycles, we have three instructions being executed, one issued >>per cycle. after three cycles, the first v0 value is completed and a new >>one is completed every cycle after that. After 4 cycles, the first v3 value >>is completed and one is completed every cycle after that. After 8 cycles, >>the first v6 value is completed and one every cycle after that. From this >>point forward, we are doing two floating adds and one floating multiply >>every clock cycle. Can your P5 do that? >> >>The cray was _not_ a fast scalar machine. Again something you don't understand. >>It _is_ one hell of a fast vector machine, if you would only look. >> >> >> >> >> >> >>> >>>You know that a cray can do 29 and i do. So cut this incredible nonsense right >>>here. >> >>You are producing the nonsense. I just quoted _directly_ from the C90 >>manual and that was the machine you were quoting for my 500K nodes per >>second. >> >> >>> >>>If you would have vectorized cray blitz correctly it would have run of course >>>faster than 500k nps. More like 5MLN nps at a 16 processor 100Mhz cray. >> >>you have no idea what "vectorized CB correctly" means, obviously, since you >>don't have a clue what "vectorized" means as you have shown many times over >>the past 8 years. >> >>Grow up and learn to understand before spouting nonsense. >> >> >>> >>>Thank you, >>>Vincent >>> >>>>Again, you make up numbers that have nothing to do with reality. A Cray >>>>can issue one instruction per cycle. The C90 I used for the ICCA DTS >>>>article had a clock cycle time of 4.167 nanoseconds, the standard C90 clock >>>>speed. That is about 250 million instructions per second per processor. With >>>>16 processors, that is 4 billion, not your mythical 46.4 billion. How about >>>>you start writing about things you know something about, and stop making stuff >>>>up about things you don't have a clue about? >>>> >>>>> >>>>>Something capable of 4.8G instructions a cycle you get 1 MLN nps because it is a >>>>>x86 processor. >>>> >>>> >>>>Pure garbage calculations don't convince anybody of anything.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.