Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: But, Re: Questions re P4 3.03 with HT ??

Author: Robert Hyatt

Date: 19:54:45 12/10/02

Go up one level in this thread


On December 10, 2002 at 21:19:18, Matt Taylor wrote:

>On December 10, 2002 at 21:13:28, Robert Hyatt wrote:
>
>>On December 10, 2002 at 20:33:34, Jeremiah Penery wrote:
>>
>>>On December 10, 2002 at 20:18:16, Robert Hyatt wrote:
>>>
>>>>On December 10, 2002 at 20:12:06, Jeremiah Penery wrote:
>>>>
>>>>>On December 10, 2002 at 20:00:11, Robert Hyatt wrote:
>>>>>
>>>>>>On December 10, 2002 at 16:43:29, Matt Taylor wrote:
>>>>>>
>>>>>>>They said that HT allows -concurrent- scheduling of threads, but the threads
>>>>>>>obviously cannot make use of the same execution resources. If this is correct,
>>>>>>>one thread would be spinning (consuming bandwidth to the L1 cache) while the
>>>>>>>other thread was doing real work.
>>>>>>
>>>>>>Again, think about what you just said, which is impossible to happen.  If one
>>>>>>thread is smoking the L1/L2 cache, then it is not waiting for _anything_ and
>>>>>>once it is scheduled it will execute until the cpu decides to flip to the other
>>>>>>thread.  Or until that thread does a pause.  Whichever comes first.
>>>>>
>>>>>The point is that the spinning thread blocks no execution units.  The processor
>>>>>can spin the idle thread all it wants, why should that stop it from scheduling
>>>>>the second thread, which _will_ use the execution units, to run at the same
>>>>>time?
>>>>
>>>>
>>>>I don't follow.  The "spinning thread" completely fills the integer pipe...
>>>
>>>Processors have more than one integer pipe, and I'm sure that a spinning thread
>>>doesn't fill more than one.  In a P4, which has dual-pumped ALUs, a spinning
>>>thread wouldn't even block a single pipe.  That is, if the scheduler were smart
>>>enough to schedule other thread(s) to fill that unit.
>>
>>Somehow we are not on the same page. A single tight compute-bound loop can
>>_completely_ fill one pipe by itself with _no_ problems.  The micro-ops
>>will simply stuff that pipe totally as every branch will be predicted
>>correctly...
>>
>>And if that thread is sucking up the cpu, the _other_ thread is going to
>>be hindered since it can probably use _everything_ in the CPU when it is
>>running...
>>
>>
>>>
>>>>The cpu doesn't execute two threads at a time, it flips and flops back and
>>>>forth between them.  The spinning thread will _never_ give up control and has
>>>>to be either preempted by the cpu, or else it has to do a pause, as explained
>>>>in the intel white-paper on the subject...
>>>>
>>>>Otherwise the pause would _not_ be needed...
>>>
>>>What's the point of hyper-threading if two threads don't run at the same time?
>>>Yeah, sure, you can execute while one thread waits on memory or something, but
>>>it's certainly not the most efficient use.  All the documentation I've seen
>>>suggests that if one thread is using, say, half the integer pipes, that another
>>>thread can be scheduled concurrently to use the other half of the pipes.
>>
>>
>>
>>What is the point in an operating system for executing two processes at the
>>same time?  Because one blocks and the other uses those unused cycles.  That
>>is the _only_ point of running more than one process at a time.  That is the
>>only point for hyper-threading also.  It has just moved a bit of the process
>>scheduling down into the CPU.  The OS feeds the CPU two candidate processes
>>to "interleave" and the CPU does that at the hardware level, more efficiently.
>>
>>As far as sharing pipes, that can happen.  But if one thread is burning one
>>pipe up doing useless work, that is lost cycles that the other thread can't
>>get to.  Which is _the_ point for the "pause" instruction...
>
>The integer pipe feeds into 5 integer execution units which can be accessed
>concurrently each cycle. However, a spin-wait loop will only be able to use 1
>unit because of register dependenies.


Not necessarily.  Look at "ThreadWait()" in Crafty.  It is a more complicated
"spin wait" that is testing several things in the same loop...  but
irregardless, of whether it is one execution busy or two or three, it does
_not_ matter.  That is one execution unit that the other thread can't get
to, which is the point for the "pause" instruction.

Otherwise the "pause" is pointless.  Why do you think they implemented that?
And why do you think they wrote a 7-8 page paper describing how to do
spinlocks and spinwaits using the pause instruction?



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.