Author: Robert Hyatt
Date: 20:28:55 03/05/00
Go up one level in this thread
On March 05, 2000 at 20:56:05, Tom Kerrigan wrote: >On March 05, 2000 at 10:19:59, Robert Hyatt wrote: > >>>I don't see why you want to bring the cache into this, if you just want to >>>compare the cores. (Which I do.) >>two reasons. (1) if a program fits totally in cache, you are testing one >>aspect of the cpu. If the program doesn't fit into cache, you add memory >>bandwidth and cache miss handling into the equation. (2) "core" speed is >>only important for a program that fits completely in cache. I am not aware > >This is straying from my original question. I'm not trying to prove that the P5 >is better than the P6, because it isn't. I just want to know what kind of >concrete performance improvement we are getting from these nifty new >microprocessor features. > >>>>Which means no register jams occur in the program. For more complex programs, >>>>the renaming logic in the P6 avoids many register jams/spills and does much >>>>better keeping both pipes filled. >>>Do you have any proof of this? >>This is a trivial thing to consider. If you have 1 register, try to figure >>out a way to keep 2 instruction pipes busy, since both can't update that one >>register in one cycle, nor can one update while the other reads. If you have > >Yes, I know exactly how all of this stuff works. I'm still only interested in >the concrete improvement (and not necessarily for the x86). I thought I made >this extremely clear a few posts ago. If you don't have any actual data, I'm a >little confused as to why you're replying to my posts. > >-Tom I'm not www.intel.com. Go there, and read. You will find the data you want, with a detailed explanation of both processor architectures and real details on why the register renaming is an issue. I think MIPS was the first to use this idea, but it makes sense. Simple case is something like this: movl anything, eax <modify eax> movl eax, anything movl anythingelse, eax <modify eax> movl eax, anythingelse the above is very common in a C program. And the P6 eats it alive, renaming the second eax so that both of the <modify eax> instruction streams can proceed thru two pipes in parallel. The p5 run on one pipe for such code. The entire point was to expose more hidden parallel instructions so that they can be run thru both pipes and keep them busy. However I am really not up to taking some asm output from a compiler and trying to get exact numbers on how much more gets executed in parallel with renaming than without. As I said, find an experienced assembly programmer and ask him about it... And this programmer doesn't really need to be an x86 programmer, as the 'problem' exists in most architectures until you get to instruction sets like the SPARC with 32 visible registers. And it can happen there in very complex code. Crafty broke Sun's optimizer pretty badly when we first started the SPEC testing cycle. Sometimes even 32 registers aren't enough. Much less 'eight'...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.