Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Linux problem at cc-NUMA machines

Author: Robert Hyatt
Date: 14:40:54 09/05/03
On September 05, 2003 at 08:51:44, Vincent Diepeveen wrote:

>On September 04, 2003 at 11:08:28, Robert Hyatt wrote:
>
>>On September 03, 2003 at 20:31:55, Vincent Diepeveen wrote:
>>
>>>On September 03, 2003 at 20:20:05, Vincent Diepeveen wrote:
>>>
>>>>On September 03, 2003 at 16:28:18, Robert Hyatt wrote:
>>>>
>>>>>On September 03, 2003 at 15:27:15, Vincent Diepeveen wrote:
>>>>>
>>>>>>On September 03, 2003 at 13:15:48, Robert Hyatt wrote:
>>>>>>
>>>>>>>On September 03, 2003 at 12:23:08, Vincent Diepeveen wrote:
>>>>>>>
>>>>>>>>On September 03, 2003 at 10:54:37, Sune Fischer wrote:
>>>>>>>>
>>>>>>>>>On September 03, 2003 at 10:48:31, Vincent Diepeveen wrote:
>>>>>>>>>
>>>>>>>>>>>I only see the need for communication when there is *somthing* to communicate.
>>>>>>>>>>
>>>>>>>>>>You answer your own question already. There continuesly is something to
>>>>>>>>>>communicate.
>>>>>>>>>
>>>>>>>>>Such as?
>>>>>>>>>
>>>>>>>>>Whatever it is maybe it can be redesigned by using a smarter message system.
>>>>>>>>>
>>>>>>>>>The parent thread doesn't need to know *what* the child thread is doing, it only
>>>>>>>>>needs to know what the child threads finds, if anything at all, right?
>>>>>>>>>
>>>>>>>>>-S.
>>>>>>>>
>>>>>>>>The only one who you are confusing is yourself.
>>>>>>>>
>>>>>>>>DIEP runs fine at any latency, but the speedup simply gets a lot less when the
>>>>>>>>latency goes up.
>>>>>>>>
>>>>>>>>There are many practical problems.
>>>>>>>>
>>>>>>>>You speak about shipping messages.
>>>>>>>>
>>>>>>>>When are you going to receive them. Check each millisecond?
>>>>>>>>
>>>>>>>>Or let the OS decide?
>>>>>>>>
>>>>>>>>The OS fires at 100Hz, so things like processes that are sleeping because of the
>>>>>>>>OS putting them to sleep (when locking and for 600 times they can't get the
>>>>>>>>lock) then you have a latency of 10 ms before the process is awake.
>>>>>>>>
>>>>>>>>You are aware of such problems?
>>>>>>>>
>>>>>>>
>>>>>>>No, because there is no such problem.  If you are running something else on
>>>>>>>the same CPU, then you will see that 10ms latency.  If that CPU is idle, then
>>>>>>>the instant the process is unblocked it will begin execution.
>>>>>>
>>>>>>Wrong, the 10ms latency is there to put something in the RUN queue of the
>>>>>>kernel. Though there is no technical reason to remove that 10ms for the OS
>>>>>>programmers, they are not allowed to do that, because that is violating
>>>>>>agreements with important software manufacturers which have written software
>>>>>>that assumes 10ms latency here and this crucial software will crash and cause
>>>>>>severe problems if it is no longer there.
>>>>>>
>>>>>>The OS helpdesk.
>>>>>
>>>>>It absolutely does _not_ work like that.  What happens is this:
>>>>>
>>>>>Processes are blocked.  As an interrupt comes in, a process gets moved from
>>>>>blocked to ready.  The temptation is to move that process from ready to run
>>>>>if it is higher in priority than the process already in run.  But that causes
>>>>>excessive context switching.  So, the process gets moved to ready and there
>>>>>it sits until the next 10ms timer interrupt fires, and _then_ the scheduler
>>>>>is called to move the currently running process back to ready, and the newly
>>>>>ready process (of a higher priority) into running.
>>>>>
>>>>>That is _all_ there is to it.
>>>>>
>>>>>If the CPU is idle, and the interrupt comes in, the process is scheduled
>>>>>_right now_, it goes from blocked to ready to run _instantly_.  No 10ms
>>>>>delay.
>>>>>
>>>>>There is no doubt about how that works.  And your explanation is simply
>>>>>garbage.  Ask some of the linux kernel guys.  Ingo Molnar is a good one to
>>>>>ask although Alan Cox will also answer.
>>>>
>>>>The double origin3800 has 1024 cpu's. One partition (P7) is sized 512 processors
>>>>from which 500 can get used simultaneously to run a single program cc-NUMA.
>>>
>>>now one more thing to mention here. Please don't start saying how good linux is
>>>compared to other OSes.
>>>
>>>For cc-NUMA it's a joke simply. It's performing for latency and scheduling very
>>>very poor when compared to IRIX.
>>>
>>>From all OSes at any parallel machine i would blindfolded pick IRIX for
>>>performance.
>>>
>>>Linux has a long way to go and will never reach that. GCC never did either with
>>>no hope of reaching it.
>>
>>Linux _will_ get there.  the compiler has _nothing_ to do with NUMA issues, so
>>GCC is a moot point.
>
>linux will never get there because it is not hardware specific.

Let me give you yet another big clue.

IRIX is _not_ hardware specific either.  It is yet another port of ATT unix
kernel.

Please grow up.


>
>to give example of the ALTIX3000 hardware where linux never will learn to work
>with. Please look at the hardware picture as in the presentation from Dr Peter
>Michielse (SGI Netherlands):
>  http://www.sara.nl/news/recent/20030703/seminar010703_Michielse_SGI.pdf
>
>It doesn't know simply which router connects to which SHUB and that routers are
>more expensive to go through than a SHUB interconnect is.

Again, and for the last time, Linux is driven by both (a) the kernel developers
and (b) user demands.  There is _no_ user demand for large NUMA machines, at
the moment.  Linux was not originally SMP-capable either, because it was
designed for the PC and the PC was non-SMP for years.  But SMP boxes came along,
and demand for support was created, and it works _well_ now.  It is as good at
SMP computing as _any_ O/S around.

When the demand for larger NUMA machines reaches some threshold, linux will
support them just fine.  Linux has _plenty_ of hardware-specific features.
From SMP, to the APIC, to memory management that is hardware specific, to
you-name-it.


>
>So when you start at a partly loaded machine a new job of say 12 cpu's, then it
>simply doesn't know how to efficiently schedule them at the machine.

See above.  Use your head for something besides holding a hat.  Until there
is a demand, there will be no product.  A vendor is interested in selling
machines, which means he has to have an O/S that works on his specialized
hardware.  However, try to buy an SGI _without_ an O/S.  :)  Then you will
see why Linux is not so interesting.

>
>If you do not understand this *basic* NUMA problem then you will *never* get a
>clue from scheduling *ever*.

I have understood the basic NUMA problem for 20 years...


>
>It's like a compiler not taking into account that a processor is using fall
>through as a basic branch prediction mechanism and that it doesn't know how to
>avoid partial register stalls and it that it doesn't know about branches getting
>a lot of penalty.
>
>That is basic processor knowledge, just like the build up of a machine is basic
>scheduling knowledge.
>
>Now don't blame SGI on this. The machine is GREAT. It is simply an improved
>origin3800 system and IRIX schedules very well at it.
>
>It is trivial that a SHUB is more efficient, cheaper and faster than extra
>routers.
>
>Linux kernel and GCC are well known for not being as hardware specific as
>rivalling OSes and products.

That is _the point_ of linux, of course.  It runs where SGI IRIX won't.
It also runs on normal SGI hardware just fine, and is better than IRIX
in that arena.  When there is pressure for NUMA linux, it will arrive.


>
>Till today that's why GCC is slower than other compilers, but it still does a
>good job compared to how poor linux is doing on such 64 processor cc-NUMA
>machines.
>
>Last months i have been a witness of that for the first NUMA kernels from linux.
>We're talking about such a dumb way of scheduling here that it could have been
>scheduled better for latency with a factor 2.
>
>Best regards,
>Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.