Author: Tom Kerrigan
Date: 13:23:26 05/24/03
Go up one level in this thread
On May 24, 2003 at 01:12:34, Robert Hyatt wrote:
>On May 23, 2003 at 23:45:09, Tom Kerrigan wrote:
>>First of all, okay, sure, let's say you're right and only SOME of the resources
>>are split. Even if only the write combine buffers are split, and you have a
>>program that works great with 4 buffers but starts "thrashing" with 3 buffers,
>>don't you see how that would cause the program to run inordinately slow with HT
>>on? Or if the processor can extract great parallelism from the instruction
>>stream with an n entry reorder window but very little parallelism with an n/2
>>window?
>
>Back to _real_ data. I run crafty twice. I get a different level of
>performance than if I run crafty _once_ using two threads. Yet both have
>the same instruction mix. Locks are infrequently used so that isn't the
>problem. However, cache coherency _is_ an issue and is most likely at
>the bottom of this mess for my case. Invalidating whole lines of cache
>is worse when a line is 128 bytes than when it is only 32 bytes. Whether
>that is the problem or not is not yet proven, just a pretty well-thought-out
>"hunch".
Try to think it out more. How could the cache prefer one thread over the other?
I don't see how this is possible with any reasonable design. It's easy enough to
test by writing a simple program, so why don't you do that? And anyway, this
STILL doesn't address my point, which is how HT can cause performance to
degrade.
>Now if you think that Intel really will take 1/2 of the physical CPU resources
>and leave them idle when only one logical processor is working, then I suppose
>your explanation might be valid. however, that would make it a bad design.
No, there's a difference between HT being enabled and HT being active. If only
one thread is being run, all of the CPU's resources are "merged" back together.
The Intel slides indicate this and I wrote as much in reply to Eugene's post.
>>Put in terms you might be able to understand, take a system with 512MB RAM. Run
>>Crafty on it and set the hash table to 256MB. Runs great, right? Now run another
>>copy with a 256MB hash table. Hmm, doesn't run so great, does it?
>
>What does this have to do with the question??? It actually might not run that
>badly, btw...
You have certain resources. A program uses > 50% of those resources. When given
only 50% of the resources, it runs like crap.
Sort of like how a program can use > 50% of a CPU's write combine buffers, and
runs like crap when it's limited to 50%.
Get it?
>>http://www.extremetech.com/print_article/0,3998,a=16756,00.asp
>>
>>The slide in the middle ("Thread-Selection Points") clearly show what's split in
>>half: queue, rename, decode, and retire. The schedule, reg read, execute, and
>>reg write steps use a toggle that will switch between threads each clock tick if
>>data from two threads is ready. Caches are not split; the reason should be
>>obvious.
>>
>>-Tom
>
>As far as the above, I haven't seen Intel say that the "rename registers" are
>split right down the middle. The first explanation I saw was quite the
>opposite in fact.
I guess it's kind of ambiguous from the slide. Really, I don't care. You seem to
have conceded that most of the resources are split, which is fine with me.
-Tom
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.