Author: Matt Taylor
Date: 20:11:36 12/10/02
Go up one level in this thread
On December 10, 2002 at 22:54:45, Robert Hyatt wrote: >On December 10, 2002 at 21:19:18, Matt Taylor wrote: > >>On December 10, 2002 at 21:13:28, Robert Hyatt wrote: >> >>>On December 10, 2002 at 20:33:34, Jeremiah Penery wrote: >>> >>>>On December 10, 2002 at 20:18:16, Robert Hyatt wrote: >>>> >>>>>On December 10, 2002 at 20:12:06, Jeremiah Penery wrote: >>>>> >>>>>>On December 10, 2002 at 20:00:11, Robert Hyatt wrote: >>>>>> >>>>>>>On December 10, 2002 at 16:43:29, Matt Taylor wrote: >>>>>>> >>>>>>>>They said that HT allows -concurrent- scheduling of threads, but the threads >>>>>>>>obviously cannot make use of the same execution resources. If this is correct, >>>>>>>>one thread would be spinning (consuming bandwidth to the L1 cache) while the >>>>>>>>other thread was doing real work. >>>>>>> >>>>>>>Again, think about what you just said, which is impossible to happen. If one >>>>>>>thread is smoking the L1/L2 cache, then it is not waiting for _anything_ and >>>>>>>once it is scheduled it will execute until the cpu decides to flip to the other >>>>>>>thread. Or until that thread does a pause. Whichever comes first. >>>>>> >>>>>>The point is that the spinning thread blocks no execution units. The processor >>>>>>can spin the idle thread all it wants, why should that stop it from scheduling >>>>>>the second thread, which _will_ use the execution units, to run at the same >>>>>>time? >>>>> >>>>> >>>>>I don't follow. The "spinning thread" completely fills the integer pipe... >>>> >>>>Processors have more than one integer pipe, and I'm sure that a spinning thread >>>>doesn't fill more than one. In a P4, which has dual-pumped ALUs, a spinning >>>>thread wouldn't even block a single pipe. That is, if the scheduler were smart >>>>enough to schedule other thread(s) to fill that unit. >>> >>>Somehow we are not on the same page. A single tight compute-bound loop can >>>_completely_ fill one pipe by itself with _no_ problems. The micro-ops >>>will simply stuff that pipe totally as every branch will be predicted >>>correctly... >>> >>>And if that thread is sucking up the cpu, the _other_ thread is going to >>>be hindered since it can probably use _everything_ in the CPU when it is >>>running... >>> >>> >>>> >>>>>The cpu doesn't execute two threads at a time, it flips and flops back and >>>>>forth between them. The spinning thread will _never_ give up control and has >>>>>to be either preempted by the cpu, or else it has to do a pause, as explained >>>>>in the intel white-paper on the subject... >>>>> >>>>>Otherwise the pause would _not_ be needed... >>>> >>>>What's the point of hyper-threading if two threads don't run at the same time? >>>>Yeah, sure, you can execute while one thread waits on memory or something, but >>>>it's certainly not the most efficient use. All the documentation I've seen >>>>suggests that if one thread is using, say, half the integer pipes, that another >>>>thread can be scheduled concurrently to use the other half of the pipes. >>> >>> >>> >>>What is the point in an operating system for executing two processes at the >>>same time? Because one blocks and the other uses those unused cycles. That >>>is the _only_ point of running more than one process at a time. That is the >>>only point for hyper-threading also. It has just moved a bit of the process >>>scheduling down into the CPU. The OS feeds the CPU two candidate processes >>>to "interleave" and the CPU does that at the hardware level, more efficiently. >>> >>>As far as sharing pipes, that can happen. But if one thread is burning one >>>pipe up doing useless work, that is lost cycles that the other thread can't >>>get to. Which is _the_ point for the "pause" instruction... >> >>The integer pipe feeds into 5 integer execution units which can be accessed >>concurrently each cycle. However, a spin-wait loop will only be able to use 1 >>unit because of register dependenies. > > >Not necessarily. Look at "ThreadWait()" in Crafty. It is a more complicated >"spin wait" that is testing several things in the same loop... but >irregardless, of whether it is one execution busy or two or three, it does >_not_ matter. That is one execution unit that the other thread can't get >to, which is the point for the "pause" instruction. > >Otherwise the "pause" is pointless. Why do you think they implemented that? >And why do you think they wrote a 7-8 page paper describing how to do >spinlocks and spinwaits using the pause instruction? Here are the first two paragraphs on the pause instruction from the P4 manual. I did not continue past that because the manual digresses from function and talks about compatibility, exceptions, pseudo-code, etc. IA-32 Intel Architecture Software Developer's Manual Vol. 2: Instruction Set Reference Order 245471-006 Page 586/966: Pause -- Spin Loop Hint Improves the performance of spin-wait loops. When executing a "spin-wait loop," a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The pause instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a pause instruction be placed in all spin-wait loops. An additional function of the pause instruction is to reduce the power consumed by a Pentium 4 processor while executing a spin loop. The Pentium 4 processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor's power consumption...
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.