Author: Robert Hyatt
Date: 20:02:12 10/16/03
Go up one level in this thread
On October 16, 2003 at 13:23:40, Gerd Isenberg wrote: ><snip> >>Being off by a factor of 2-3 is not very significant when dealing with numbers >>posted by Vincent. :) a number within a factor of 2-3 for him is an >>_excellent_ approximation. > >Hehe, you and Vincent are really incorporated in doing side blows ;-) I just get tired of his making up numbers, then arguing about them until it is clear that he is wrong, then he just disappears and waits until the next fabrication opportunity comes along. > >AMD64's 16 128-bit register became the default floating point register in >64-bit windows - during context switch x87-stack (MMX) is not saved/restored. >What i found curious about SSE2 is that the xxm-registers have i kind of type >state (int,float,double) and different instruction sets for load/store, >arithmetic of course, but even for bitwise and/or/xor (to reset/set/toggle sign >bit in floats?). The instructions use the same execution units. "False" type >instructions result in some penalty cycles. > >> However, that notwithstanding, the C90 can certainly >>produce 6 floating point results every 4 nanoseconds, as a theoretical peak. > >A vector of 128*128-bit each, may be including memory access? On the C90, FP is 64 bits normally. I've never done 128 bit FP on the Cray so I don't know how it is done (it is done for those needing the precision however). But yes, the C90 can do 64 bit * 64 bit in one cycle including the memory accesses. On a good vector machine, you can't get away from the latency, but after the first word, consecutive words in a vector have a latency of one cycle, which means after the first word of a vector read shows up, successive words show up on successive cycles with no delays. That's what makes 'em so fast... > >> It >>can sustain 4 (about 1 GFLOPS) with many codes, and go beyond that for certain >>special cases. It is hard to imaging a PC coming anywhere near that, even >>comparing today's PC with the C90 which is 13 years old. There are much faster >>Cray's around today... > >Maybe the performance timegap between super computer and PC becomes closer. >Is there any trend? Not really. Scalar instructions are way faster on a PC. But the vector stuff blows them away. IE can you imagine a PC with an effective memory latency of zero? That is what a super-computer looks like in vector mode on long vectors...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.