Author: Robert Hyatt
Date: 14:55:18 10/15/03
Go up one level in this thread
On October 15, 2003 at 16:16:47, Vincent Diepeveen wrote: >On October 15, 2003 at 16:13:00, Robert Hyatt wrote: > >>On October 15, 2003 at 13:43:03, Mridul Muralidharan wrote: >> >>>Hi, >>> >>> I think there could be a possible unintentional error in your statements. >>> >>>If you see evolution of cray : >>>1) better cycle times (from 12.5ns in the Cray 1 down to 2. 9ns in the SX 3) and >>>2) higher number of floating point operations initiated per cycle - from 2 flops >>>cycle in the Cray 1 up to 4 in the Cray C 90 and 16 in the SX 3. >>> >>>Regards >>>Mridul >> >> >>We are not talking about floating point ops. We are talking about instructions >>executed. (issued in Vincent's terminology). That has _always_ been one >>instruction per cycle, since the first Cray-1, through the T90. The C90 > >15 simultaneously at a single C90 processor. Aha, now you are finally beginning to read. As I said, I can chain up to three vector functional units together, after the three cycles needed to issue each of the three vector instructions, one clock at a time, I am now doing three vector operations every cycle. But the cpu has two vector pipes, one does even-numbered elements, the other does odd-numbered elements. That lets me get 6 vector operations per cycle done after the start-up time. I can tack on one scalar vector operation for a total of 7 floating point operations per cycle. Where you get your number of 15 I don't know. The C90 cpu is rated at a rough theoretical max of 1.5 GFLOPS per second. That is about 6 FLOPS per cycle at a 4ns cycle time. 15 doesn't match any benchmark I know of. Here is some output from the Hardware Performance Monitor for a "special-case" piece of code that drives the hardware as fast as it can, even though it is not a practical piece of stuff: Million inst/sec (MIPS) : 19.41 Instructions : 22006465 Avg. clock periods/inst : 12.36 % CP holding issue : 85.64 CP holding issue : 232994424 Inst.buffer fetches/sec : 0.00M Inst.buf. fetches: 550 Floating ops/sec : 1354.99M F.P. ops : 1536000769 "Vector Floating ops/sec : 1354.99M Vec F.P. ops : 1536000768 Notice only 19.4 MIPS. Which is only one instruction every 13 clock cycles, typical of vector stuff since the instructions run for _many_ cycles and each instruction produces 128 results. Notice 1354 GFLOPS, which is near the theoretical peak of 1.5GFLOPS. (note that these numbers are for a single CPU. Multiply by 16 to get the real total for the entire machine). So, your numbers are getting closer (dropping from 19 instructions per cycle down to 15 operations per cycle. But you are still way high. And my numbers aren't made up. They can be found here: http://www.cray.com/craydoc/manuals/004-2182-002/html-004-2182-002/zfixed1qzdhueg.html#U8WKLCHRI > >>can do more than four floating point operations per cycle. A single vector >>operation does two per cycle. You can chain multiple vector instructions >>together to go beyond that. The theoretical limit ought to be beyond 8 but >>I will check my C90 manual when I get back in the office...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.