Author: Robert Hyatt
Date: 19:57:07 10/16/03
Go up one level in this thread
On October 15, 2003 at 14:07:08, Mridul Muralidharan wrote: >Hi, > > Mixed up NEC while writing about cray. >Cray C90 : >cycle time : 4.0 , pipes : 2 , flops/cycle : 4 , words/cycle : 4 + 2 > >Regards >Mridul > The C90 can actually go beyond 4 FLOPS/cycle. But only for oddball cases. It can only read 4 words and write 2 words per cycle, which means that for reasonable matrix sizes, it can only fetch and operate on four words and write two results back. (ie a[i] = b[i]+c[i]; is a classic example. The C90 fetches two b's, two c's does the add, and writes two a's every cycle. You can chain this to handle a[i] = b[i]*c[i] + C; which fetches two b's and two c's and multiplies them together, then adds in a constant C (or another vector that has been pre-computed and in a vector register). That gives 4 FLOPS/cycle. But then while that is running you can also do a scalar FLOP every cycle as well, making 5. And if you can use the vector shift unit, you can do 6 vector FLOPS/cycle plus one scalar FLOP. The theoretical peak is always listed as 1.5GFLOP as if we are ripping off vector OPS, we can't fetch scalar OPS very easily so we have to depend on the scalar stuff being in registers and instruction buffers already. It can happen, but not on real codes, very often. I posted a link that drove a C90 at 1.3+ GFLOPS (single CPU) from the Cray web site. > >On October 15, 2003 at 13:43:03, Mridul Muralidharan wrote: > >>Hi, >> >> I think there could be a possible unintentional error in your statements. >> >>If you see evolution of cray : >>1) better cycle times (from 12.5ns in the Cray 1 down to 2. 9ns in the SX 3) and >>2) higher number of floating point operations initiated per cycle - from 2 flops >>cycle in the Cray 1 up to 4 in the Cray C 90 and 16 in the SX 3. >> >>Regards >>Mridul
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.