Author: Robert Hyatt
Date: 21:28:09 12/10/02
Go up one level in this thread
On December 10, 2002 at 23:11:36, Matt Taylor wrote: >On December 10, 2002 at 22:54:45, Robert Hyatt wrote: > >>On December 10, 2002 at 21:19:18, Matt Taylor wrote: >> >>>On December 10, 2002 at 21:13:28, Robert Hyatt wrote: >>> >>>>On December 10, 2002 at 20:33:34, Jeremiah Penery wrote: >>>> >>>>>On December 10, 2002 at 20:18:16, Robert Hyatt wrote: >>>>> >>>>>>On December 10, 2002 at 20:12:06, Jeremiah Penery wrote: >>>>>> >>>>>>>On December 10, 2002 at 20:00:11, Robert Hyatt wrote: >>>>>>> >>>>>>>>On December 10, 2002 at 16:43:29, Matt Taylor wrote: >>>>>>>> >>>>>>>>>They said that HT allows -concurrent- scheduling of threads, but the threads >>>>>>>>>obviously cannot make use of the same execution resources. If this is correct, >>>>>>>>>one thread would be spinning (consuming bandwidth to the L1 cache) while the >>>>>>>>>other thread was doing real work. >>>>>>>> >>>>>>>>Again, think about what you just said, which is impossible to happen. If one >>>>>>>>thread is smoking the L1/L2 cache, then it is not waiting for _anything_ and >>>>>>>>once it is scheduled it will execute until the cpu decides to flip to the other >>>>>>>>thread. Or until that thread does a pause. Whichever comes first. >>>>>>> >>>>>>>The point is that the spinning thread blocks no execution units. The processor >>>>>>>can spin the idle thread all it wants, why should that stop it from scheduling >>>>>>>the second thread, which _will_ use the execution units, to run at the same >>>>>>>time? >>>>>> >>>>>> >>>>>>I don't follow. The "spinning thread" completely fills the integer pipe... >>>>> >>>>>Processors have more than one integer pipe, and I'm sure that a spinning thread >>>>>doesn't fill more than one. In a P4, which has dual-pumped ALUs, a spinning >>>>>thread wouldn't even block a single pipe. That is, if the scheduler were smart >>>>>enough to schedule other thread(s) to fill that unit. >>>> >>>>Somehow we are not on the same page. A single tight compute-bound loop can >>>>_completely_ fill one pipe by itself with _no_ problems. The micro-ops >>>>will simply stuff that pipe totally as every branch will be predicted >>>>correctly... >>>> >>>>And if that thread is sucking up the cpu, the _other_ thread is going to >>>>be hindered since it can probably use _everything_ in the CPU when it is >>>>running... >>>> >>>> >>>>> >>>>>>The cpu doesn't execute two threads at a time, it flips and flops back and >>>>>>forth between them. The spinning thread will _never_ give up control and has >>>>>>to be either preempted by the cpu, or else it has to do a pause, as explained >>>>>>in the intel white-paper on the subject... >>>>>> >>>>>>Otherwise the pause would _not_ be needed... >>>>> >>>>>What's the point of hyper-threading if two threads don't run at the same time? >>>>>Yeah, sure, you can execute while one thread waits on memory or something, but >>>>>it's certainly not the most efficient use. All the documentation I've seen >>>>>suggests that if one thread is using, say, half the integer pipes, that another >>>>>thread can be scheduled concurrently to use the other half of the pipes. >>>> >>>> >>>> >>>>What is the point in an operating system for executing two processes at the >>>>same time? Because one blocks and the other uses those unused cycles. That >>>>is the _only_ point of running more than one process at a time. That is the >>>>only point for hyper-threading also. It has just moved a bit of the process >>>>scheduling down into the CPU. The OS feeds the CPU two candidate processes >>>>to "interleave" and the CPU does that at the hardware level, more efficiently. >>>> >>>>As far as sharing pipes, that can happen. But if one thread is burning one >>>>pipe up doing useless work, that is lost cycles that the other thread can't >>>>get to. Which is _the_ point for the "pause" instruction... >>> >>>The integer pipe feeds into 5 integer execution units which can be accessed >>>concurrently each cycle. However, a spin-wait loop will only be able to use 1 >>>unit because of register dependenies. >> >> >>Not necessarily. Look at "ThreadWait()" in Crafty. It is a more complicated >>"spin wait" that is testing several things in the same loop... but >>irregardless, of whether it is one execution busy or two or three, it does >>_not_ matter. That is one execution unit that the other thread can't get >>to, which is the point for the "pause" instruction. >> >>Otherwise the "pause" is pointless. Why do you think they implemented that? >>And why do you think they wrote a 7-8 page paper describing how to do >>spinlocks and spinwaits using the pause instruction? > >Here are the first two paragraphs on the pause instruction from the P4 manual. I >did not continue past that because the manual digresses from function and talks >about compatibility, exceptions, pseudo-code, etc. > >IA-32 Intel Architecture Software Developer's Manual Vol. 2: Instruction Set >Reference >Order 245471-006 > >Page 586/966: Pause -- Spin Loop Hint > >Improves the performance of spin-wait loops. When executing a "spin-wait loop," >a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when >exiting the loop because it detects a possible memory order violation. The pause >instruction provides a hint to the processor that the code sequence is a >spin-wait loop. The processor uses this hint to avoid the memory order violation >in most situations, which greatly improves processor performance. For this >reason, it is recommended that a pause instruction be placed in all spin-wait >loops. > >An additional function of the pause instruction is to reduce the power consumed >by a Pentium 4 processor while executing a spin loop. The Pentium 4 processor >can execute a spin-wait loop extremely quickly, causing the processor to consume >a lot of power while it waits for the resource it is spinning on to become >available. Inserting a pause instruction in a spin-wait loop greatly reduces the >processor's power consumption... That is prior to SMT. The speculative execution can load up multiple iterations of a spin-lock into the pipe and that causes problems since the CPU can do out- of-order writes and that can lead to errors when fiddling with cache lines. The pause prevents more than one iteration to enter the pipe avoiding that problem. But for hyper-threading, it does more. If you go to intel and search for hyper-threading you can find at least a couple of papers that discuss the spin-lock with hyper-threading issue in detail. I don't know that the power-consumption thing is true as that is what is recommended as the reason for using a "halt" to stop a thread from buzzing a cpu (logical cpu) when possible, although a normal user has to resort to "pause" as a second-best option. This from the "long spin-wait and hyper- threading" article on the intel developer's site.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.