Author: Robert Hyatt
Date: 19:52:19 07/12/03
Go up one level in this thread
On July 12, 2003 at 15:29:30, Vincent Diepeveen wrote: >On July 11, 2003 at 13:02:12, Robert Hyatt wrote: > >>On July 11, 2003 at 06:26:12, Vincent Diepeveen wrote: >> >>>On July 10, 2003 at 16:50:45, Robert Hyatt wrote: >>> >>>Pingpong does *not* need a processor to get wakened up. Don't know which OS you >>>use that might be doing that, but the ping pong for MPI when properly >>>implemented does *not* wake up processes at all. >> >>Vincent that just shows how _little_ you understand about what is going >>on. When you send a packet to a remote machine, _something_ has to read >>that packet. That _something_ is a process, and it is blocked until the >>packet arrives. >> >>That's why my ping-pong test is done differently. >> >> >>> >>>I hope you will understand that. If pingpong would wait for a process to wake up >>>it would run at 10 ms because a process can wake up at most 100 times a second >>>in linux (and all *nix flavours) as the scheduler runs 100Hz. >> >>That is _incorrect_. >> >>Unix (Linux in particular) will only context switch every 10ms or so, to > >Being someone with a bad memory, you really should more accurately write down >the stuff you already had guessed wrong in the past. A month or 6 ago you >guessed it wrong here in CCC. Now you guess it wrong again. > >To clear your memory, a month or 6 ago i very upset posted that letting my >processes idle using spin_lock performed very bad. Can't help that. I use spinlocks to avoid blocking/unblocking a thread. It is a significant performance boost. It _always_ has been a significant performance boost. You just have to do it _right_. > >I had forwarded my questions on this to the IRIX OS guys. Note that they use >linux too at the new systems, so this was a very on topic question at the time >to post about. In fact this is relevant to redhat linux 7.2 too. > >If not directly unlocked (within 600 ns or so) within the kernel, getting a new >lock from the system means that a locked process when achieving the lock gets >put back in the run_queue. > >The run_queue executes at 100 Hz in all *nix systems. The reason for that, even >though they could make it way faster nowadays, is that some very important >software is assuming it is running at 100Hz. No, you misunderstand what is going on. The 10ms is what you can see _if_ the CPU is currently busy doing something else, such as executing a lower-priority task, and a higher-priority task suddenly unblocks for whatever reason. We don't want to context switch for such cases too often, and 10ms (100 context switches a second _voluntarily_) is the target. But the 10ms is _not_ a static unchangable constant. If a process blocks after 1ms, another process executes _immediately_. It does -not- wait for another 9ms to get scheduled. _never_. If nothing is running, and the CPU is in the idle loop, when a process unblocks it is scheduled _immediately_, just as quickly as the context can be set up and control passed to the process. Simple. Explained in _any_ O/S book. Just look it up and stop quoting that 10ms as though it was some sort of limit. When _I_ play chess on ICC, I _never_ suffer for 10ms latency. I am the _only_ thing running on my processors. But I use spin locks to avoid the time needed to load the process context and give it control, by simply spinning so I _never_ give up control in the first place. > >Therefore the latency is at least 10 ms. Only when the processor has _other_ things to run. Then latency is irrelevant since you are not getting the CPU to yourself _anyway_. > >When i upset posted that here, i got as answer that windows was not a hair >better as it had a typical latency of 15 ms to wake up, i did not get that from >the NT kernel team, but from someone posting here. > >I am getting very sick of you getting back to the same wrong lemma's each time. > I'm also getting very sick of your posting crap because you don't understand what you are talking about. >This was posted very clearly. > >Trivially a searching thread of crafty/diep will be running already at the >processor. > >It is clear that the MPI is not needing wake up time of a process to work. If it >would then the researchers would be mad to use MPI. ping-pong _does_ need to be woken up however. That was my point. > >It is *not* doing that however. > >Getting put in the run_queue each time is no fun :) > >>control context switching overhead. But if _nothing_ is ready to run, and >>a process unblocks, it does _not_ wait another 10ms before running. It >>runs _right now_. >> >> >>> >>>If you do not care for the pingpong test then you obviously do not care for >>>chessprograms as well. Because if you need a hashtable entry you definitely need >>>that latency test to measure. >> >>No, I just run the ping pong test _correctly_ to see what the real latency >>of the hardware is. Software/scheduling latency is _another_ issue that may >>or may not affect a chess program. IE I don't do blocking I/O so I don't >>have to get "woken up" ever. >> >>> >>>Note that all HPC professors do include pingpong in the first tests they use to >>>measure supercomputers/clusters. >>> >>>In fact out of the x cluster/supercomputer dudes i asked after pingpong test i >>>got within a second answer out of all of them. For them the pingpong test *is* >>>very relevant. >> >>It is relevant. How it is done is _also_ relevant. >> >>If you measure latency as you are doing, you will get different answers on >>different systems. Windows and Linux are different and you are measuring >>both software _and_ hardware latency with your method. Same problem if you >>compare windows to solaris, or IRIX, or any other unix variant (or non-unix >>if you want.) >> >>My latency measure answers the following question: >> >>"How long does it take me to send a packet to the remote machine and have >>it receive it?" Independent of the Operating System. Independent of the >>application. Just "what can the hardware do best-case?" >> >>I know _that_ number precisely. And if I want to write a chess program that >>uses that hardware, and I want _that_ latency, I can certainly get it by doing >>my own direct hardware interface. If I want to be more portable, and absorb >>some O/S latency on top of the hardware latency, I'll do it a different way. >> >>However, I _have_ been using the term "hardware latency" and I have most >>certainly measured it very precisely. And if you'd like to see me bounce >>a packet back and forth in just over a usec, I'll be happy to do so. >> >>Whether you can do it or not is irrelevant, of course. >> >>Because I didn't say _you_ could do it. I said _I_ had _done_ it. I notice you still don't respond to the parts you started talking about?
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.