Author: Robert Hyatt
Date: 07:41:37 10/15/03
Go up one level in this thread
On October 15, 2003 at 10:01:44, Vincent Diepeveen wrote: >On October 15, 2003 at 09:32:10, Robert Hyatt wrote: > >>On October 14, 2003 at 17:29:18, Vincent Diepeveen wrote: >> >>>On October 14, 2003 at 16:18:28, Robert Hyatt wrote: >>> >>>>On October 14, 2003 at 14:29:36, Gerd Isenberg wrote: >>>> >>>>>On October 14, 2003 at 14:15:33, Vincent Diepeveen wrote: >>>>> >>>>>>On October 14, 2003 at 14:13:08, Gerd Isenberg wrote: >>>>>> >>>>>>>On October 14, 2003 at 10:07:10, Ricardo Gibert wrote: >>>>>>> >>>>>>>> >>>>>>>>http://www.wired.com/news/technology/0,1282,60791,00.html >>>>>>>> >>>>>>>>Can this be productively used in a chess program? >>>>>>> >>>>>>>I don't know, simular hardware ressources may be more productive for chess, if >>>>>>>implemented as hyperthreading devices. I guess it's a kind of further >>>>>>>development of SSE and AltiVec technology. With huge register files >>>>>>>(N * 64 * 64|128|256-bit?) and probably SIMD-wise integer instructions >>>>>>>(including popcount?) and fast memory interface, i can imagine that it is >>>>>>>usefull for a lot of nice things, like some eval passes, e.g. a first square >>>>>>>wise and a final scalar product pass. And fill-attack generation, e.g. square >>>>>>>wise in all 16 directions with a specialiced dumb fill routine. >>>>>>> >>>>>>>Gerd >>>>>> >>>>>>this is just floating point arrays. >>>>> >>>>>Aha, well may be a matter of interpretation. >>>>>I havn't seen any instruction set yet. >>>>> >>>>>On the other hand, if float and double arithmetic becomes as fast (or faster) as >>>>>integer, why not use it for eval purposes? >>>>> >>>>>Gerd >>>> >>>> >>>>Correct. We did this on the Cray. FP was very fast there and it frees >>>>up integer registers for addresses and array indices... >>> >>>That's of course true however at 16 processors of 100Mhz you reached 500k nodes >>>a second with cray blitz. >>> >>>Each Cray processor can issue up to 29 instructions a cycle. >> >>I have no idea what you are talking about. Each cray processor can issue >>_one_ instruction per cycle. >> >>however, doing vector stuff, in one cycle the machine can do four memory >>reads and two memory writes (8 byte words) per processor. It can also do >>multiple things in one cycle with vector chaining, but it never issues more >>than one instruction per cycle per cpu. >> >>I don't know what data you are looking at, but it is wrong. >> >>> >>>Crafty at a 1.6Ghz K7 which can issue up to 3 instructions a cycle gets 1 >>>million nodes a second. >>> >>>So something capable of 100M * 16 * 29 = 46.4G instructions a cycle you get 500k >>>nps because it is a vector machine > >Bob cut the crap. > >If cray would execute 1 instruction a cycle then the processors would be >10 times slower than any other solution. Vincent, wanna make a bet? Any amount of money you care to put on it. The cray issues one instruction per cycle. Of course, you have _no idea_ of what a vector machine does and how it does it, so you aren't going to understand anything about the machine. But one instruction per cycle per processor is _it_. You can find this in any good Cray Reference. I'll be happy to xerox a page from the C90 hardware reference manual that gives this info. Next, do you understand the difference between an _instruction_ and an _operation_? Didn't think so. The cray has a set of vector instructions where _one_ instruction produces multiple results by operating on a vector. But it can't _issue_ more than one instruction per cycle. It is possible that by issuing multiple consecutive instructions, you "chain" vector functional units together and produce multiple _operations_ per cycle. But _not_ multiple instructions. Why don't you try to talk about something you know something about, if there is such a topic? And stop trying to talk "cray" to someone that has actually _used_ them for 20+ years? > >Yet everyone loves crays because they are vector processors which can do up to >29 instructions a cycle. Nope. One instruction per cycle. Try this on for size: Cray Y-MP C90 System Programmer Reference Manual, CSM-0500-000 "A fetch sequence begins immediately and transfers a block of instructions from memory to an instruction buffer. The issue sequence then selects the instruction indicated by the program address (P) register, decodes it, determines whether the required registers or functional units are available, and if so, allows the instruction to be executed. As the instruction executes, the P register increments, causing a new instruction to be selected from the instruction buffer." The above happens _once_ per processor cycle. Again, you don't understand what vector processing is all about. > >Even a P5/100 would have been faster than a cray because it can do 2 >instructions a clock at 100Mhz. So? How long would it take that P5/100 to execute (say) a floating point add? The cray does one in 3 cycles. But if it is a vector instruction, afther the first result pops out after 3 cycles, the next result pops out one cycle later, and this continues until the vector has been completely processed. Can your P5 do one floating add per cycle? Didn't think so. After you issue several floating point vector instructions (here is an example): v0 v1+v2 v3 v4+v5 v6 v0*v3 after three cycles, we have three instructions being executed, one issued per cycle. after three cycles, the first v0 value is completed and a new one is completed every cycle after that. After 4 cycles, the first v3 value is completed and one is completed every cycle after that. After 8 cycles, the first v6 value is completed and one every cycle after that. From this point forward, we are doing two floating adds and one floating multiply every clock cycle. Can your P5 do that? The cray was _not_ a fast scalar machine. Again something you don't understand. It _is_ one hell of a fast vector machine, if you would only look. > >You know that a cray can do 29 and i do. So cut this incredible nonsense right >here. You are producing the nonsense. I just quoted _directly_ from the C90 manual and that was the machine you were quoting for my 500K nodes per second. > >If you would have vectorized cray blitz correctly it would have run of course >faster than 500k nps. More like 5MLN nps at a 16 processor 100Mhz cray. you have no idea what "vectorized CB correctly" means, obviously, since you don't have a clue what "vectorized" means as you have shown many times over the past 8 years. Grow up and learn to understand before spouting nonsense. > >Thank you, >Vincent > >>Again, you make up numbers that have nothing to do with reality. A Cray >>can issue one instruction per cycle. The C90 I used for the ICCA DTS >>article had a clock cycle time of 4.167 nanoseconds, the standard C90 clock >>speed. That is about 250 million instructions per second per processor. With >>16 processors, that is 4 billion, not your mythical 46.4 billion. How about >>you start writing about things you know something about, and stop making stuff >>up about things you don't have a clue about? >> >>> >>>Something capable of 4.8G instructions a cycle you get 1 MLN nps because it is a >>>x86 processor. >> >> >>Pure garbage calculations don't convince anybody of anything.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.