Author: Robert Hyatt
Date: 08:46:58 08/09/02
Go up one level in this thread
On August 09, 2002 at 02:39:17, Gian-Carlo Pascutto wrote: >On August 09, 2002 at 00:00:31, Robert Hyatt wrote: > >>This is useless testing. You have two processes running and only one cpu. >>At any instant in time, one can hold a lock and the other can be spinning >>waiting on the lock. The process scheduler has no idea which process is >>doing real work (the one holding the lock) or which is spinning (the one >>waiting on the lock). So you burn cpu cycles needlessly. If you have an >>idle loop (as I do using 'thread pools') then the same problem happens >>there... the scheduler can't tell which is actually searching for a position >>where the others can help, and which are spinning waiting on work doing nothing >>useful... > >I must be missing something here, because that is exactly what I'm aiming at. > OK.. then this question. One process holds the lock. The O/S happens to schedule the _other_ process which is spinning on the lock. That process spins for whatever the scheduling time quantum is, typically about .4 seconds. That represents a .4 second slice of time when _no_ useful work is being done. While on a real dual, the second processor would run, clear the lock and the first would only spin for microseconds rather than hundreds of milliseconds... >I want each process to get as close to 50% cpu utilisation as possible. That >means that they _have_ to burn cpu cycles even if they are not doing anything. That doesn't represent how the real dual machine runs, however. IE set the scheduling quantum to 1 minute and you will see what I mean. If the O/S picks the spinner rather than the holder, you burn a minute of cpu time that would never happen in the dual machine... It just means your estimated speedup is going to be very pessimistic and will only be useful as a lower bound, with the actual speedup somewhere above that bound. > >Spinning while waiting on a lock is lost performance when running parallel, >so I want to get loss there as well on the single cpu. > You are missing the point. When you set a lock and keep it set for (say) 10 instructions, _no_ thread will ever spin for more than 10 cycles. But on a single cpu, a thread can spin for .4 seconds. A _huge_ difference. >If I wouldn't do the busy-spin, and never split, my parallel stuff would >finish just as fast as the serial one, and I'd erronously conclude I've >got a 2.0 speedup. If I busy spin, I'll finish in double the time, and >correctly conclude no (1.0) speedup. You might get a 1.0 on a single, but you will get more on a dual. Which means you simply get a pessimistic lower bound on the speedup as you are doing it... > >-- >GCP
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.