Author: Anthony Cozzie
Date: 08:50:10 10/15/03
Go up one level in this thread
On October 15, 2003 at 10:51:29, Robert Hyatt wrote: > >I blew one bit of the previous calculation. The C90 is a "super-scalar" sort >of vector machine. Where I said "one floating add per cycle" change that to >two. A single vector instruction does _two_ operations per cycle, not one, and >I had simply failed to note that. That was the main change from the older >X-MP and Y-MP, that was introduced on the C90. Obviously it makes vector >performance 2x faster even without the clock speed improvement. IE for my >example: > > v0 v1+v2 > v3 v4+v5 > v6 v0*v3 > >that code will produce _six_ results per cycle, once the chained vector >pipeline is filled. Not the _three_ I had given. > >_that_ is why the Cray buries the PC in _any_ program that can use vectors. >Even though the C90 only runs at 250 mhz. The T90 runs that up to 500mhz, >and the Cray-3 doubled it again to 1ghz. But all mhz/ghz are _not_ created >"equal" for those that understand vector operations. > >The C90 is a 250mhz machine, not the 100 Vincent pulls from you-know-where. >But no 2500mhz 80x86 can produce 6 64-bit IEEE floating point operations >every 4 nanoseconds. > >I don't know how to explain it better to someone that simply doesn't have a >single scintilla of background on understanding the concept of "a vector >machine." What about P4 with SSE2? According to my P4 optimization manual, P4 can do 2 DP adds in parallel, with latency = 4. I *think* that the SSE2 ALUs in P4 are fully pipelined, so that means 4 FP ops/clock. Obviously it can only do this on vectorized data, but the same constraint applies to the cray. Unfortunately, P4 was built do vectorized multimedia-ish stuff, not computer chess :( anthony
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.