Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: New intel 64 bit ?

Author: Robert Hyatt

Date: 19:52:19 07/12/03

Go up one level in this thread


On July 12, 2003 at 15:29:30, Vincent Diepeveen wrote:

>On July 11, 2003 at 13:02:12, Robert Hyatt wrote:
>
>>On July 11, 2003 at 06:26:12, Vincent Diepeveen wrote:
>>
>>>On July 10, 2003 at 16:50:45, Robert Hyatt wrote:
>>>
>>>Pingpong does *not* need a processor to get wakened up. Don't know which OS you
>>>use that might be doing that, but the ping pong for MPI when properly
>>>implemented does *not* wake up processes at all.
>>
>>Vincent that just shows how _little_ you understand about what is going
>>on.  When you send a packet to a remote machine, _something_ has to read
>>that packet.  That _something_ is a process, and it is blocked until the
>>packet arrives.
>>
>>That's why my ping-pong test is done differently.
>>
>>
>>>
>>>I hope you will understand that. If pingpong would wait for a process to wake up
>>>it would run at 10 ms because a process can wake up at most 100 times a second
>>>in linux (and all *nix flavours) as the scheduler runs 100Hz.
>>
>>That is _incorrect_.
>>
>>Unix (Linux in particular) will only context switch every 10ms or so, to
>
>Being someone with a bad memory, you really should more accurately write down
>the stuff you already had guessed wrong in the past. A month or 6 ago you
>guessed it wrong here in CCC. Now you guess it wrong again.
>
>To clear your memory, a month or 6 ago i very upset posted that letting my
>processes idle using spin_lock performed very bad.

Can't help that.  I use spinlocks to avoid blocking/unblocking a thread.  It
is a significant performance boost.  It _always_ has been a significant
performance boost.  You just have to do it _right_.

>
>I had forwarded my questions on this to the IRIX OS guys. Note that they use
>linux too at the new systems, so this was a very on topic question at the time
>to post about. In fact this is relevant to redhat linux 7.2 too.
>
>If not directly unlocked (within 600 ns or so) within the kernel, getting a new
>lock from the system means that a locked process when achieving the lock gets
>put back in the run_queue.
>
>The run_queue executes at 100 Hz in all *nix systems. The reason for that, even
>though they could make it way faster nowadays, is that some very important
>software is assuming it is running at 100Hz.

No, you misunderstand what is going on.  The 10ms is what you can see
_if_ the CPU is currently busy doing something else, such as executing a
lower-priority task, and a higher-priority task suddenly unblocks for
whatever reason.  We don't want to context switch for such cases too often,
and 10ms (100 context switches a second _voluntarily_) is the target.  But the
10ms is _not_ a static unchangable constant.  If a process blocks after 1ms,
another process executes _immediately_.  It does -not- wait for another 9ms
to get scheduled.  _never_.  If nothing is running, and the CPU is in the
idle loop, when a process unblocks it is scheduled _immediately_, just as
quickly as the context can be set up and control passed to the process.

Simple.  Explained in _any_ O/S book.  Just look it up and stop quoting
that 10ms as though it was some sort of limit.  When _I_ play chess on ICC,
I _never_ suffer for 10ms latency.  I am the _only_ thing running on my
processors.  But I use spin locks to avoid the time needed to load the
process context and give it control, by simply spinning so I _never_ give
up control in the first place.


>
>Therefore the latency is at least 10 ms.

Only when the processor has _other_ things to run.  Then latency is
irrelevant since you are not getting the CPU to yourself _anyway_.

>
>When i upset posted that here, i got as answer that windows was not a hair
>better as it had a typical latency of 15 ms to wake up, i did not get that from
>the NT kernel team, but from someone posting here.
>
>I am getting very sick of you getting back to the same wrong lemma's each time.
>

I'm also getting very sick of your posting crap because you don't understand
what you are talking about.


>This was posted very clearly.
>
>Trivially a searching thread of crafty/diep will be running already at the
>processor.
>
>It is clear that the MPI is not needing wake up time of a process to work. If it
>would then the researchers would be mad to use MPI.

ping-pong _does_ need to be woken up however.  That was my point.

>
>It is *not* doing that however.
>
>Getting put in the run_queue each time is no fun :)
>
>>control context switching overhead.  But if _nothing_ is ready to run, and
>>a process unblocks, it does _not_ wait another 10ms before running.  It
>>runs _right now_.
>>
>>
>>>
>>>If you do not care for the pingpong test then you obviously do not care for
>>>chessprograms as well. Because if you need a hashtable entry you definitely need
>>>that latency test to measure.
>>
>>No, I just run the ping pong test _correctly_ to see what the real latency
>>of the hardware is.  Software/scheduling latency is _another_ issue that may
>>or may not affect a chess program.  IE I don't do blocking I/O so I don't
>>have to get "woken up" ever.
>>
>>>
>>>Note that all HPC professors do include pingpong in the first tests they use to
>>>measure supercomputers/clusters.
>>>
>>>In fact out of the x cluster/supercomputer dudes i asked after pingpong test i
>>>got within a second answer out of all of them. For them the pingpong test *is*
>>>very relevant.
>>
>>It is relevant.  How it is done is _also_ relevant.
>>
>>If you measure latency as you are doing, you will get different answers on
>>different systems.  Windows and Linux are different and you are measuring
>>both software _and_ hardware latency with your method.  Same problem if you
>>compare windows to solaris, or IRIX, or any other unix variant (or non-unix
>>if you want.)
>>
>>My latency measure answers the following question:
>>
>>"How long does it take me to send a packet to the remote machine and have
>>it receive it?"  Independent of the Operating System.  Independent of the
>>application.  Just "what can the hardware do best-case?"
>>
>>I know _that_ number precisely.  And if I want to write a chess program that
>>uses that hardware, and I want _that_ latency, I can certainly get it by doing
>>my own direct hardware interface.  If I want to be more portable, and absorb
>>some O/S latency on top of the hardware latency, I'll do it a different way.
>>
>>However, I _have_ been using the term "hardware latency" and I have most
>>certainly measured it very precisely.  And if you'd like to see me bounce
>>a packet back and forth in just over a usec, I'll be happy to do so.
>>
>>Whether you can do it or not is irrelevant, of course.
>>
>>Because I didn't say _you_ could do it.  I said _I_ had _done_ it.


I notice you still don't respond to the parts you started talking about?




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.