Author: Tom Kerrigan
Date: 20:45:09 05/23/03
Go up one level in this thread
On May 23, 2003 at 22:56:43, Robert Hyatt wrote:
>On May 23, 2003 at 02:50:41, Tom Kerrigan wrote:
>
>>On May 22, 2003 at 22:24:29, Robert Hyatt wrote:
>>
>>>On May 22, 2003 at 13:43:55, Tom Kerrigan wrote:
>>>
>>>>On May 21, 2003 at 22:20:57, Robert Hyatt wrote:
>>>>
>>>>>On May 21, 2003 at 15:48:46, Tom Kerrigan wrote:
>>>>>
>>>>>>On May 21, 2003 at 13:46:26, Robert Hyatt wrote:
>>>>>>
>>>>>>>On May 20, 2003 at 13:52:01, Tom Kerrigan wrote:
>>>>>>>
>>>>>>>>On May 20, 2003 at 00:26:49, Robert Hyatt wrote:
>>>>>>>>
>>>>>>>>>Actually it _does_ surprise me. The basic idea is that HT provides improved
>>>>>>>>>resource utilization within the CPU. IE would you prefer to have a dual 600mhz
>>>>>>>>>or a single 1000mhz machine? I'd generally prefer the dual 600, although for
>>>>>>>>
>>>>>>>>You're oversimplifying HT. When HT is running two threads, each thread only gets
>>>>>>>>half of the core's resources. So instead of your 1GHz vs. dual 600MHz situation,
>>>>>>>>what you have is more like a 1GHz Pentium 4 vs. a dual 1GHz Pentium. The dual
>>>>>>>>will usually be faster, but in many cases it will be slower, sometimes by a wide
>>>>>>>>margin.
>>>>>>>
>>>>>>>Not quite. Otherwise how do you explain my NPS _increase_ when using a second
>>>>>>>thread on a single physical cpu?
>>>>>>>
>>>>>>>The issue is that now things can be overlapped and more of the CPU core
>>>>>>>gets utilized for a greater percent of the total run-time...
>>>>>>>
>>>>>>>If it were just 50-50 then there would be _zero_ improvement for perfect
>>>>>>>algorithms, and a negative improvement for any algorithm with any overhead
>>>>>>>whatsoever...
>>>>>>>
>>>>>>>And the 50-50 doesn't even hold true for all cases, as my test results have
>>>>>>>shown, even though I have yet to find any reason for what is going on...
>>>>>>
>>>>>>Think a little bit before posting, Bob. I said that the chip's execution
>>>>>>resources were evenly split, I didn't say that the chip's performance is evently
>>>>>>split. That's just stupid. You have to figure in how those execution resources
>>>>>>are utilized and understand that adding more of these resources gives you
>>>>>>diminishing returns.
>>>>>>
>>>>>>-Tom
>>>>>
>>>>>
>>>>>You shold follow your own advice. If resources are split "50-50" then how
>>>>>can _my_ program produce a 70-30 split on occasion?
>>>>>
>>>>>It simply is _not_ possible.
>>>>>
>>>>>There is more to this than a simple explanation offers...
>>>>
>>>>Now you're getting off onto another topic here.
>>>>
>>>
>>>Read backward. _I_ did not "change the topic".
>>>
>>>I said that I don't see how it is possible for HT to slow a program down.
>>>
>>>You said "50-50" resource allocation might be an explanation.
>>>
>>>I said "that doesn't seem plausible because I have at least one example of
>>>two compute-bound threads that don't show a 50-50 balance on SMT."
>>
>>I said it before and I'll say it again, a 50-50 _core_ resource split does not
>>mean a 50-50 performance split. Again, you have to account for how those
>>resources are utilized. Anybody who's passed the first semester of comp arch
>>should be able to grasp this immediately.
>
>You should be able to grasp this: I am running _exactly_ the same program
>on _both_ processors. And when I say "exactly" the same I mean _exactly the
>same_. In fact, I am using the _same_ virtual address space on _both_ logical
>processors.
>
>So your reasoning simply doesn't fly in this case. If the resource units are
>split and are both running the _same_ identical instruction stream, the
>performance should be exactly split as well. But in my case, it isn't.
>
>There is another explanation... Somewhere...
Again, it seems like you're back to your stupid 70-30 problem.
We can deal with this in a sec, let's get back to the actual point, which is
programs slowing down, or not slowing down, with HT turned on.
First of all, okay, sure, let's say you're right and only SOME of the resources
are split. Even if only the write combine buffers are split, and you have a
program that works great with 4 buffers but starts "thrashing" with 3 buffers,
don't you see how that would cause the program to run inordinately slow with HT
on? Or if the processor can extract great parallelism from the instruction
stream with an n entry reorder window but very little parallelism with an n/2
window?
Put in terms you might be able to understand, take a system with 512MB RAM. Run
Crafty on it and set the hash table to 256MB. Runs great, right? Now run another
copy with a 256MB hash table. Hmm, doesn't run so great, does it?
As for your 70-30 problem, you are not running _exactly_ the same program on
both logical processors. Remember, you did that and the performance was split
exactly 50-50. You problem is when you start doing threads. That is NOT running
_exactly_ the same program. E.g., if one thread is spinning, waiting for a lock,
how is that doing exactly the same thing as the other thread?
>>Complete bull. This design is no secret--Intel wants everybody to know exactly
>>how HT works so they can optimize their software for it. This information is all
>>over Intel's web pages and developer documentation. Links to said pages have
>>been posted to this message board. It will only take YOU some time to figure out
>>because your head seems to be stuck in the sand.
>>
>>-Tom
>
>Give me a link. I have read almost _everything_ on Intel's web site. And I
>don't find key core descriptions of what is done _internally_...
I don't feel like doing extra work for you, so I just did a 2 second Google
search ("xeon hyperthreading split reorder") and found this page from Intel
presentations:
http://www.extremetech.com/print_article/0,3998,a=16756,00.asp
The slide in the middle ("Thread-Selection Points") clearly show what's split in
half: queue, rename, decode, and retire. The schedule, reg read, execute, and
reg write steps use a toggle that will switch between threads each clock tick if
data from two threads is ready. Caches are not split; the reason should be
obvious.
-Tom
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.