Author: Vincent Diepeveen
Date: 09:21:17 10/15/03
Go up one level in this thread
On October 15, 2003 at 10:41:37, Robert Hyatt wrote: Bob, go to www.cray.com and see how many instructions this thing can put through a clock :) Nowadays they are 1Ghz and have a 256KB cache too. But just imagine that they would not be able to do 29 instructions a clock but just 1 or 2. Then any x86 processor blows them away at floating point :) >On October 15, 2003 at 10:01:44, Vincent Diepeveen wrote: > >>On October 15, 2003 at 09:32:10, Robert Hyatt wrote: >> >>>On October 14, 2003 at 17:29:18, Vincent Diepeveen wrote: >>> >>>>On October 14, 2003 at 16:18:28, Robert Hyatt wrote: >>>> >>>>>On October 14, 2003 at 14:29:36, Gerd Isenberg wrote: >>>>> >>>>>>On October 14, 2003 at 14:15:33, Vincent Diepeveen wrote: >>>>>> >>>>>>>On October 14, 2003 at 14:13:08, Gerd Isenberg wrote: >>>>>>> >>>>>>>>On October 14, 2003 at 10:07:10, Ricardo Gibert wrote: >>>>>>>> >>>>>>>>> >>>>>>>>>http://www.wired.com/news/technology/0,1282,60791,00.html >>>>>>>>> >>>>>>>>>Can this be productively used in a chess program? >>>>>>>> >>>>>>>>I don't know, simular hardware ressources may be more productive for chess, if >>>>>>>>implemented as hyperthreading devices. I guess it's a kind of further >>>>>>>>development of SSE and AltiVec technology. With huge register files >>>>>>>>(N * 64 * 64|128|256-bit?) and probably SIMD-wise integer instructions >>>>>>>>(including popcount?) and fast memory interface, i can imagine that it is >>>>>>>>usefull for a lot of nice things, like some eval passes, e.g. a first square >>>>>>>>wise and a final scalar product pass. And fill-attack generation, e.g. square >>>>>>>>wise in all 16 directions with a specialiced dumb fill routine. >>>>>>>> >>>>>>>>Gerd >>>>>>> >>>>>>>this is just floating point arrays. >>>>>> >>>>>>Aha, well may be a matter of interpretation. >>>>>>I havn't seen any instruction set yet. >>>>>> >>>>>>On the other hand, if float and double arithmetic becomes as fast (or faster) as >>>>>>integer, why not use it for eval purposes? >>>>>> >>>>>>Gerd >>>>> >>>>> >>>>>Correct. We did this on the Cray. FP was very fast there and it frees >>>>>up integer registers for addresses and array indices... >>>> >>>>That's of course true however at 16 processors of 100Mhz you reached 500k nodes >>>>a second with cray blitz. >>>> >>>>Each Cray processor can issue up to 29 instructions a cycle. >>> >>>I have no idea what you are talking about. Each cray processor can issue >>>_one_ instruction per cycle. >>> >>>however, doing vector stuff, in one cycle the machine can do four memory >>>reads and two memory writes (8 byte words) per processor. It can also do >>>multiple things in one cycle with vector chaining, but it never issues more >>>than one instruction per cycle per cpu. >>> >>>I don't know what data you are looking at, but it is wrong. >>> >>>> >>>>Crafty at a 1.6Ghz K7 which can issue up to 3 instructions a cycle gets 1 >>>>million nodes a second. >>>> >>>>So something capable of 100M * 16 * 29 = 46.4G instructions a cycle you get 500k >>>>nps because it is a vector machine >> >>Bob cut the crap. >> >>If cray would execute 1 instruction a cycle then the processors would be >>10 times slower than any other solution. > >Vincent, wanna make a bet? Any amount of money you care to put on it. > >The cray issues one instruction per cycle. Of course, you have _no idea_ >of what a vector machine does and how it does it, so you aren't going to >understand anything about the machine. But one instruction per cycle per >processor is _it_. > >You can find this in any good Cray Reference. I'll be happy to xerox a page >from the C90 hardware reference manual that gives this info. > >Next, do you understand the difference between an _instruction_ and an >_operation_? Didn't think so. The cray has a set of vector instructions >where _one_ instruction produces multiple results by operating on a vector. >But it can't _issue_ more than one instruction per cycle. It is possible that >by issuing multiple consecutive instructions, you "chain" vector functional >units together and produce multiple _operations_ per cycle. But _not_ >multiple instructions. > >Why don't you try to talk about something you know something about, if there >is such a topic? And stop trying to talk "cray" to someone that has actually >_used_ them for 20+ years? > >> >>Yet everyone loves crays because they are vector processors which can do up to >>29 instructions a cycle. > >Nope. > >One instruction per cycle. Try this on for size: > >Cray Y-MP C90 System Programmer Reference Manual, CSM-0500-000 > >"A fetch sequence begins immediately and transfers a block of instructions >from memory to an instruction buffer. The issue sequence then selects the >instruction indicated by the program address (P) register, decodes it, >determines whether the required registers or functional units are available, >and if so, allows the instruction to be executed. > >As the instruction executes, the P register increments, causing a new >instruction to be selected from the instruction buffer." > >The above happens _once_ per processor cycle. > >Again, you don't understand what vector processing is all about. > >> >>Even a P5/100 would have been faster than a cray because it can do 2 >>instructions a clock at 100Mhz. > > >So? How long would it take that P5/100 to execute (say) a floating point >add? The cray does one in 3 cycles. But if it is a vector instruction, >afther the first result pops out after 3 cycles, the next result pops out >one cycle later, and this continues until the vector has been completely >processed. > >Can your P5 do one floating add per cycle? Didn't think so. After you >issue several floating point vector instructions (here is an example): > > v0 v1+v2 > v3 v4+v5 > v6 v0*v3 > >after three cycles, we have three instructions being executed, one issued >per cycle. after three cycles, the first v0 value is completed and a new >one is completed every cycle after that. After 4 cycles, the first v3 value >is completed and one is completed every cycle after that. After 8 cycles, >the first v6 value is completed and one every cycle after that. From this >point forward, we are doing two floating adds and one floating multiply >every clock cycle. Can your P5 do that? > >The cray was _not_ a fast scalar machine. Again something you don't understand. >It _is_ one hell of a fast vector machine, if you would only look. > > > > > > >> >>You know that a cray can do 29 and i do. So cut this incredible nonsense right >>here. > >You are producing the nonsense. I just quoted _directly_ from the C90 >manual and that was the machine you were quoting for my 500K nodes per >second. > > >> >>If you would have vectorized cray blitz correctly it would have run of course >>faster than 500k nps. More like 5MLN nps at a 16 processor 100Mhz cray. > >you have no idea what "vectorized CB correctly" means, obviously, since you >don't have a clue what "vectorized" means as you have shown many times over >the past 8 years. > >Grow up and learn to understand before spouting nonsense. > > >> >>Thank you, >>Vincent >> >>>Again, you make up numbers that have nothing to do with reality. A Cray >>>can issue one instruction per cycle. The C90 I used for the ICCA DTS >>>article had a clock cycle time of 4.167 nanoseconds, the standard C90 clock >>>speed. That is about 250 million instructions per second per processor. With >>>16 processors, that is 4 billion, not your mythical 46.4 billion. How about >>>you start writing about things you know something about, and stop making stuff >>>up about things you don't have a clue about? >>> >>>> >>>>Something capable of 4.8G instructions a cycle you get 1 MLN nps because it is a >>>>x86 processor. >>> >>> >>>Pure garbage calculations don't convince anybody of anything.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.