Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Multiple processors on one chip...

Author: Robert Hyatt

Date: 20:28:55 03/05/00

Go up one level in this thread


On March 05, 2000 at 20:56:05, Tom Kerrigan wrote:

>On March 05, 2000 at 10:19:59, Robert Hyatt wrote:
>
>>>I don't see why you want to bring the cache into this, if you just want to
>>>compare the cores. (Which I do.)
>>two reasons.  (1) if a program fits totally in cache, you are testing one
>>aspect of the cpu.  If the program doesn't fit into cache, you add memory
>>bandwidth and cache miss handling into the equation.  (2) "core" speed is
>>only important for a program that fits completely in cache.  I am not aware
>
>This is straying from my original question. I'm not trying to prove that the P5
>is better than the P6, because it isn't. I just want to know what kind of
>concrete performance improvement we are getting from these nifty new
>microprocessor features.
>
>>>>Which means no register jams occur in the program.  For more complex programs,
>>>>the renaming logic in the P6 avoids many register jams/spills and does much
>>>>better keeping both pipes filled.
>>>Do you have any proof of this?
>>This is a trivial thing to consider.  If you have 1 register, try to figure
>>out a way to keep 2 instruction pipes busy, since both can't update that one
>>register in one cycle, nor can one update while the other reads.  If you have
>
>Yes, I know exactly how all of this stuff works. I'm still only interested in
>the concrete improvement (and not necessarily for the x86). I thought I made
>this extremely clear a few posts ago. If you don't have any actual data, I'm a
>little confused as to why you're replying to my posts.
>
>-Tom


I'm not www.intel.com.  Go there, and read.  You will find the data you
want, with a detailed explanation of both processor architectures and real
details  on why the register renaming is an issue.  I think MIPS was the
first to use this idea, but it makes sense.

Simple case is something like this:

movl  anything, eax
<modify eax>
movl  eax, anything

movl  anythingelse, eax
<modify eax>
movl eax, anythingelse

the above is very common in a C program.  And the P6 eats it alive,
renaming the second eax so that both of the <modify eax> instruction
streams can proceed thru two pipes in parallel.  The p5 run on one pipe
for such code.

The entire point was to expose more hidden parallel instructions so that they
can be run thru both pipes and keep them busy.  However I am really not up to
taking some asm output from a compiler and trying to get exact numbers on
how much more gets executed in parallel with renaming than without.

As I said, find an experienced assembly programmer and ask him about it...  And
this programmer doesn't really need to be an x86 programmer, as the 'problem'
exists in most architectures until you get to instruction sets like the SPARC
with 32 visible registers.

And it can happen there in very complex code.  Crafty broke Sun's optimizer
pretty badly when we first started the SPEC testing cycle.  Sometimes even 32
registers aren't enough.  Much less 'eight'...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.