Author: Robert Hyatt
Date: 13:50:45 07/10/03
Go up one level in this thread
On July 09, 2003 at 18:42:25, Vincent Diepeveen wrote: >On July 09, 2003 at 16:02:11, Robert Hyatt wrote: > >>On July 09, 2003 at 00:23:52, Vincent Diepeveen wrote: >> >>>On July 08, 2003 at 11:58:58, Robert Hyatt wrote: >>> >>>>On July 08, 2003 at 08:49:48, Vincent Diepeveen wrote: >>>> >>>>>On July 07, 2003 at 10:48:02, Robert Hyatt wrote: >>>>> >>>>>>On July 05, 2003 at 23:37:47, Jay Urbanski wrote: >>>>>> >>>>>>>On July 04, 2003 at 23:33:46, Robert Hyatt wrote: >>>>>>> >>>>>>><snip> >>>>>>>>"way better than MPI". Both use TCP/IP, just like PVM. Except that MPI/OpenMP >>>>>>>>is designed for homogeneous clusters while PVM works with heterogeneous mixes. >>>>>>>>But for any of the above, the latency is caused by TCP/IP, _not_ the particular >>>>>>>>library being used. >>>>>>> >>>>>>>With latency a concern I don't know why you'd use TCP/IP as the transport for >>>>>>>MPI when there are much faster ones available. >>>>>>> >>>>>>>Even VIA over Ethernet would be an improvement. >>>>>> >>>>>>I use VIA over ethernet, and VIA over a cLAN giganet switch as well. The >>>>>>cLAN hardware produces .5usec latench which is about 1000X better than any >>>>> >>>>>Bob, the latencies that i quote are RASML : Random Average Shared Memory >>>>>Latencies. >>>>> >>>>>The latencies that you quote here are sequential latencies. Bandwidth divided by >>>>>the number of seconds = latency (according to the manufacturers). >>>> >>>>No it isn't. It is computed by _me_. By randomly sending packets to different >>>>nodes on this cluster and measuring the latency. I'm not interested in any >>> >>>You need to ship a packet and then WAIT for it to get back. the simplest test is >>>using 1 way pingpong. I will email you that program now. >>> >>>You will see about a 20-30 usec latency then. >> >>Want to bet? How about "the loser stops posting here?" >> >> >>> >>>>kind of bandwidth number. I _know_ that is high. It is high on a gigabit >>>>ethernet switch. I'm interested in the latency, how long does it take me to >>>>get a packet from A to B, and there ethernet (including gigabit) is slow. >>> >>>>The cLAN with VIA is not. >>> >>>>IE on this particular cluster, it takes about 1/2 usec to get a short >>>>packet from A to B. The longer the packet, the longer the latency since I >>>>assume that I need the last byte before I can use the first byte, which >>>>might not always be true. >>> >>>Bob this is not one way ping pong latency. Not to mention that it isn't a full >>>ship and receive. >> >>So what. Ping-pong is the _only_ way I know to measure latency. I told you >>that is what I did. What is your problem with understanding that? > >Bob, on this planet there are a thousand machines using the network cards you >have and there is guys in italy who are busy making their own protocols in order >to get faster latencies and they *manage*. > >Now stop bragging about something you don't know. You do *not* know one way ping >pong latencies. Actually, I _DO_ know one-way ping pong latencies. Of course, you seem to know everything about everything, including everything everybody _else_ knows, so that's an argument that can't be won. But just because you say it, does _not_ make it so. I knew what "ping pong" latency was before you were _born_. > >When you just got your machine i asked you: "what is the latency of this thing?" > >Then you took a while to get the manufacturer specs of the and then came back >with: "0.5 usec". No, I took a while to go run the test. If you were to ask me the max sustainable I/O rate I would do the same thing. > >However that's as we all know bandwidth divided by time. No it isn't. Latency has _nothing_ to do with bandwidth in the context of networking. And _nobody_ I know of computes it that way. > >A very poor understanding of latency. > >Here is what pingpong as we can use it does. It ships 8 bytes and then waits for >those 8 bytes to get back. > >After that it ships again 8 bytes and then waits for those 8 bytes to get back. > >If you want to you may make that 4 bytes too. I don't care. That is _exactly_ what my latency measurement does. As I have now said for the _fourth_ time. > >The number of times you can do those shipments a second is called n. > >the latency in microseconds = 1 million / n > >So don't quote the same thing you quoted a bunch of years ago again. > >That's not the latency we're looking after. Marketing managers have rewritten >and rewritten that definition until they had something very fast. You can define latency however you want. I use _the_ definition that everybody else uses however, and will continue to do so. Latency is the time taken to send a packet from A to B. One way to measure it is to do the ping-pong test, although that is _not_ an accurate measurement. If you want me to explain why I will be happy to do so. But to make it simple, that kind of ping-pong test measures _more_ than latency. Namely it includes the time needed to wake up and schedule a process on the other end, which is _not_ part of the latency. Of course, you won't understand that... But I thought I'd try. > >For your information your own machines have if i remember well 66Mhz PCI cards >or something. That's cool cards, they're much better than 33Mhz cards. That >means that the latency of the pci bus which is about 4 usec, is added to that of >the network cards when you do the measurement as above. > >Is that new for you? Yes, and it is wrong. Here is a test I just ran: I did an "scp" copy of kqbkqb.nbw.emd from machine A to machine B. That is almost 1 gigabyte of data, and it took 7.9 seconds to complete. To do that copy, that "slow PCI bus" had to do the following: 1. Deliver a copy of 1 gigabyte from disk to memory. 2. deliver a copy of 1 gigabyte from memory to the CPU for encryption. 3. Deliver a copy of 1 gigabyte from the cpu back to memory (this is the encrypted data). 4. Deliver 1 gigabyte from memory to the CPU (this is the TCP/IP layer copying and stuffing the data into packets. 5. Deliver 1 gigabyte of data from the CPU to memory, this is the other half of the data copying to get the stuff to TCP/IP packet buffers. 6. Deliver 1 gigabyte of data from memory to the network Card. Your 4usec numbers are a bit distorted. Your 250,000 "messages per second" is not just distorted, but _wrong_. My machine moved about 6 gigabytes of data in 7 seconds, and much of that delay was in the SCSI disk reads and the network writes (this is on a gigabit network). So please don't quote me any of your nonsense numbers, it is far easier to run the tests. If you want the lm-bench numbers for memory speeds, I can easily provide that. Without any hand-waving. > >I am sure it should be as you're just quoting the same number i hear already >several years from you. I have had this cluster for 2.5 years. You have been hearing the same number from me repeatedly since I got it. Not before. > >Of course you didn't do a pingpong. If you get under 1 microsecond from node to >node you made a mistake. Latency of PCI is already way above that. As I said, you _must_ know what you are doing to measure this stuff. You are factoring in way more than PCI latency. Which is not a surprise, since you don't know beans about operating systems. > >Now the above latency time which we need in order to know what it takes to send >and receive a message, is divided by 2 by the pingpong program. That's called >'one way pingpong' then. > >So better pack your bags. Righto. You seem to confuse "ping pong" with a game played with two paddles, a net, and a small white ball. But your latency measurement is not the way to do it. One day I'll tell you how _I_ do the ping-pong test, which _really_ measures latency. _not_ the way you do it, by the way.... > >>> >>>In computerchess you don't ship something without waiting for answer back. >>>You *want* answer back. >>> >>>Example if you want to split a node :) >> >>Wrong. It is not hard to do this. I say "do this" and that is all I need >>to do until I get the result back." I don't need a "OK, I got that, I'll >>be back with the answer in a while." It is easier to just keep going until >>the answer arrives back. >> >>> >>>The 0.5 usec latency is based upon shipping a terabyte data without answer back. >> >>No it isn't. > > > >> >> >> >>> >>>Bandwidth / time needed = latency then. >>> >>>What i tried to explain to you is RASML but i know you won't understand it. >>> >>>In order to waste time onto this i'll just email the thing to you. >>> >>>Run it any time you like, But run it on 2 different nodes. Don't run it at the >>>same node :) >> >> >>You sent me some MPI crap that I'm not going to fool with. As I said, I >>use VIA to use the cLAN stuff. VIA. Not MPI. >> >>But I'm not going to waste timr running your crap anyway as whenever I do it, >>and you don 't like the results, you just disappear for a while. >> >> >> >>> >>>>VIA has some cute stuff to "share memory" too. >>>> >>>>> >>>>>For computer chess that can't be used however. >>>>> >>>>>You can more accurate get an indication by using the well known ping pong >>>>>program. What it does is over MPI it ships messages and then WAITS for them to >>>>>come back. Then it divides that time by 2. Then it is called one way ping pong >>>>>latencies. >>>> >>>>That's how _I_ measure latency. I know of no other way, since keepting two >>>>machine clocks synced that accurately is not easy. >>>> >>>> >>>>> >>>>>If you multiply that by 2, you already get closer to the latency that it takes >>>>>to get a single bitboard out of memory. >>>> >>>>It doesn't take me .5usec to get a bitboard out of memory. Unless you are >>>>talking about a NUMA machine where machine A wants the bitboard and it is >>>>not in its local memory. >>>> >>>> >>>>> >>>>>Even better is using the RASML test i wrote. That's using OpenMP though but >>>>>conversion to MPI is trivial (yet slowing down things so much that it is less >>>>>accurate than openmp). >>>>> >>>>>So the best indication you can get is by doing a simple pingpong latency test. >>>> >>>>I do this all the time. >>>> >>>>> >>>>>The best ethernet network cards are myrilnet work cards (about $1300). I do not >>>>>know which chipset they have. They can achieve at 133Mhz PCI64X (jay might know >>>>>more about specifications here) like 5 usec one way ping pong latency, so that's >>>>>a minimum of way more than 10 usec to get a bitboard from the other side of th >>>>>emachine. >>>> >>>>Correct. cLAN is faster. It is also more expensive. The 8-port switch we >>>>use cost us about $18,000 two years ago. Myrinet was designed as a lower-cost >>>>network. With somewhat lower performance. >>>> >>>>> >>>>>In your cluster you probably do not have such PCI stuff Bob. Most likely it is >>>>>around 10 usec for one way latency at your cluster so you can get at minimum of >>>>>20 usec to get a message. >>>> >>>>In my cluster I have PCI cards that are faster than Myrinet. They were made by >>>>cLAN (again) and we paid about $1,500 each for them two years ago. Again, you >>>>can find info about the cLAN stuff and compare it to myrinet if you want. We >>>>have Myrinet stuff here on campus (not in any of my labs) and we have done the >>>>comparisons. When we write proposals to NSF, they _always_ push us towards >>>>Myrinet because it is cheaper then the cLAN stuff, but it also is lower >>>>performance. >>>> >>>> >>>> >>>>> >>>>>Note that getting a cache line out of local memory of your quad xeons is already >>>>>taking about 0.5 usec. You can imagine hopefully that the quoted usecs by the >>>>>manufacturer for cLan is based upon bandwidth / time needed. And NOT the RASM >>>>>latencies. >>>> >>>>Your number there is dead wrong. My cluster is PIII based, with a cache >>>>line of 32 bytes. It uses 4-way interleaving. lm_bench reports the latency >>>>as 132 nanoseconds, _total_. >>>> >>>>> >>>>>Best regards, >>>>>Vincent >>>>> >>>>> >>>>>>TCP/IP-ethernet implementation. However, ethernet will never touch good >>>>>>hardware like the cLAN stuff. >>>>>> >>>>>>MPI/PVM use ethernet - tcp/ip for one obvious reason: "portability" and >>>>>>"availability". :)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.