Author: Gerd Isenberg
Date: 15:26:35 09/23/03
Go up one level in this thread
>><snip> > >you still don't get it. you're betting at the wrong horse! > no, i'll try to use some otherwise dead ressources en passant. >SSE2 simply will *never* execute more than 1 instruction a cycle. not sure, i guess 2 cycles latency is for decoding and execution. So during some instructions are decoded some other will execute. A cycle has some edges. The drawback is double direct path, which requires two units (FADD/FMUL) simultaniusly or one unit sequentially. I guess simultanius execution of the two mops is best case instruction latency of 2 cycles, sequentially a bit more but not the double time. If you have real independent instruction chains using up to eight register pairs of the 16 available XMM, the instruction throughput is like doing four or may be more SSE2-mops or up to two SSE2-instructions in parallel. With future hammer four - like my current MMX chains on Athlon (P4 sucks). > >This where it is trivial that at the next generation of processors, after >opteron, the IPC for integer instructions will go up and up. > >So you always lose relative to integer performance! They don't exclude, they (may) complement each other. Due to pure register processing, SSE2-Kogge-Stone hides also some memory latency of independent leading move from memory to register here and there. For P4, considering hyperthreading, it may be a bad idea to keep really all pipes perfectly busy in one thread - but for AMD? Gerd > >Best regards, >Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.