Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: But, Re: Questions re P4 3.03 with HT ??

Author: Robert Hyatt

Date: 21:28:09 12/10/02

On December 10, 2002 at 23:11:36, Matt Taylor wrote:

>On December 10, 2002 at 22:54:45, Robert Hyatt wrote:
>
>>On December 10, 2002 at 21:19:18, Matt Taylor wrote:
>>
>>>On December 10, 2002 at 21:13:28, Robert Hyatt wrote:
>>>
>>>>On December 10, 2002 at 20:33:34, Jeremiah Penery wrote:
>>>>
>>>>>On December 10, 2002 at 20:18:16, Robert Hyatt wrote:
>>>>>
>>>>>>On December 10, 2002 at 20:12:06, Jeremiah Penery wrote:
>>>>>>
>>>>>>>On December 10, 2002 at 20:00:11, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>On December 10, 2002 at 16:43:29, Matt Taylor wrote:
>>>>>>>>
>>>>>>>>>They said that HT allows -concurrent- scheduling of threads, but the threads
>>>>>>>>>obviously cannot make use of the same execution resources. If this is correct,
>>>>>>>>>one thread would be spinning (consuming bandwidth to the L1 cache) while the
>>>>>>>>>other thread was doing real work.
>>>>>>>>
>>>>>>>>Again, think about what you just said, which is impossible to happen.  If one
>>>>>>>>thread is smoking the L1/L2 cache, then it is not waiting for _anything_ and
>>>>>>>>once it is scheduled it will execute until the cpu decides to flip to the other
>>>>>>>>thread.  Or until that thread does a pause.  Whichever comes first.
>>>>>>>
>>>>>>>The point is that the spinning thread blocks no execution units.  The processor
>>>>>>>can spin the idle thread all it wants, why should that stop it from scheduling
>>>>>>>the second thread, which _will_ use the execution units, to run at the same
>>>>>>>time?
>>>>>>
>>>>>>
>>>>>>I don't follow.  The "spinning thread" completely fills the integer pipe...
>>>>>
>>>>>Processors have more than one integer pipe, and I'm sure that a spinning thread
>>>>>doesn't fill more than one.  In a P4, which has dual-pumped ALUs, a spinning
>>>>>thread wouldn't even block a single pipe.  That is, if the scheduler were smart
>>>>>enough to schedule other thread(s) to fill that unit.
>>>>
>>>>Somehow we are not on the same page. A single tight compute-bound loop can
>>>>_completely_ fill one pipe by itself with _no_ problems.  The micro-ops
>>>>will simply stuff that pipe totally as every branch will be predicted
>>>>correctly...
>>>>
>>>>And if that thread is sucking up the cpu, the _other_ thread is going to
>>>>be hindered since it can probably use _everything_ in the CPU when it is
>>>>running...
>>>>
>>>>
>>>>>
>>>>>>The cpu doesn't execute two threads at a time, it flips and flops back and
>>>>>>forth between them.  The spinning thread will _never_ give up control and has
>>>>>>to be either preempted by the cpu, or else it has to do a pause, as explained
>>>>>>in the intel white-paper on the subject...
>>>>>>
>>>>>>Otherwise the pause would _not_ be needed...
>>>>>
>>>>>What's the point of hyper-threading if two threads don't run at the same time?
>>>>>Yeah, sure, you can execute while one thread waits on memory or something, but
>>>>>it's certainly not the most efficient use.  All the documentation I've seen
>>>>>suggests that if one thread is using, say, half the integer pipes, that another
>>>>>thread can be scheduled concurrently to use the other half of the pipes.
>>>>
>>>>
>>>>
>>>>What is the point in an operating system for executing two processes at the
>>>>same time?  Because one blocks and the other uses those unused cycles.  That
>>>>is the _only_ point of running more than one process at a time.  That is the
>>>>only point for hyper-threading also.  It has just moved a bit of the process
>>>>scheduling down into the CPU.  The OS feeds the CPU two candidate processes
>>>>to "interleave" and the CPU does that at the hardware level, more efficiently.
>>>>
>>>>As far as sharing pipes, that can happen.  But if one thread is burning one
>>>>pipe up doing useless work, that is lost cycles that the other thread can't
>>>>get to.  Which is _the_ point for the "pause" instruction...
>>>
>>>The integer pipe feeds into 5 integer execution units which can be accessed
>>>concurrently each cycle. However, a spin-wait loop will only be able to use 1
>>>unit because of register dependenies.
>>
>>
>>Not necessarily.  Look at "ThreadWait()" in Crafty.  It is a more complicated
>>"spin wait" that is testing several things in the same loop...  but
>>irregardless, of whether it is one execution busy or two or three, it does
>>_not_ matter.  That is one execution unit that the other thread can't get
>>to, which is the point for the "pause" instruction.
>>
>>Otherwise the "pause" is pointless.  Why do you think they implemented that?
>>And why do you think they wrote a 7-8 page paper describing how to do
>>spinlocks and spinwaits using the pause instruction?
>
>Here are the first two paragraphs on the pause instruction from the P4 manual. I
>did not continue past that because the manual digresses from function and talks
>about compatibility, exceptions, pseudo-code, etc.
>
>IA-32 Intel Architecture Software Developer's Manual Vol. 2: Instruction Set
>Reference
>Order 245471-006
>
>Page 586/966: Pause -- Spin Loop Hint
>
>Improves the performance of spin-wait loops. When executing a "spin-wait loop,"
>a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when
>exiting the loop because it detects a possible memory order violation. The pause
>instruction provides a hint to the processor that the code sequence is a
>spin-wait loop. The processor uses this hint to avoid the memory order violation
>in most situations, which greatly improves processor performance. For this
>reason, it is recommended that a pause instruction be placed in all spin-wait
>loops.
>
>An additional function of the pause instruction is to reduce the power consumed
>by a Pentium 4 processor while executing a spin loop. The Pentium 4 processor
>can execute a spin-wait loop extremely quickly, causing the processor to consume
>a lot of power while it waits for the resource it is spinning on to become
>available. Inserting a pause instruction in a spin-wait loop greatly reduces the
>processor's power consumption...


That is prior to SMT.  The speculative execution can load up multiple iterations
of a spin-lock into the pipe and that causes problems since the CPU can do out-
of-order writes and that can lead to errors when fiddling with cache lines.  The
pause prevents more than one iteration to enter the pipe avoiding that problem.

But for hyper-threading, it does more.  If you go to intel and search for
hyper-threading you can find at least a couple of papers that discuss the
spin-lock with hyper-threading issue in detail.

I don't know that the power-consumption thing is true as that is what is
recommended as the reason for using a "halt" to stop a thread from buzzing
a cpu (logical cpu) when possible, although a normal user has to resort to
"pause" as a second-best option.  This from the "long spin-wait and hyper-
threading" article on the intel developer's site.

Re: But, Re: Questions re P4 3.03 with HT ?? Matt Taylor 23:34:33 12/10/02
- advantages versus disadvantage P4 Vincent Diepeveen 07:15:16 12/12/02
  - Re: advantages versus disadvantage P4 Matt Taylor 18:16:55 12/13/02
    - Re: advantages versus disadvantage P4 Robert Hyatt 20:05:39 12/13/02
      - Re: advantages versus disadvantage P4 Matt Taylor 22:09:12 12/13/02
        
        Re: advantages versus disadvantage P4 Robert Hyatt 10:55:00 12/14/02
        
        Re: advantages versus disadvantage P4 Eugene Nalimov 22:38:31 12/13/02
        
        Re: advantages versus disadvantage P4 Robert Hyatt 10:52:23 12/14/02
  - Re: advantages versus disadvantage P4 Robert Hyatt 11:04:49 12/12/02
- Re: But, Re: Questions re P4 3.03 with HT ?? Robert Hyatt 06:57:23 12/11/02

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.