Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: New intel 64 bit ?

Author: Robert Hyatt

Date: 13:02:11 07/09/03

Go up one level in this thread


On July 09, 2003 at 00:23:52, Vincent Diepeveen wrote:

>On July 08, 2003 at 11:58:58, Robert Hyatt wrote:
>
>>On July 08, 2003 at 08:49:48, Vincent Diepeveen wrote:
>>
>>>On July 07, 2003 at 10:48:02, Robert Hyatt wrote:
>>>
>>>>On July 05, 2003 at 23:37:47, Jay Urbanski wrote:
>>>>
>>>>>On July 04, 2003 at 23:33:46, Robert Hyatt wrote:
>>>>>
>>>>><snip>
>>>>>>"way better than MPI".  Both use TCP/IP, just like PVM.  Except that MPI/OpenMP
>>>>>>is designed for homogeneous clusters while PVM works with heterogeneous mixes.
>>>>>>But for any of the above, the latency is caused by TCP/IP, _not_ the particular
>>>>>>library being used.
>>>>>
>>>>>With latency a concern I don't know why you'd use TCP/IP as the transport for
>>>>>MPI when there are much faster ones available.
>>>>>
>>>>>Even VIA over Ethernet would be an improvement.
>>>>
>>>>I use VIA over ethernet, and VIA over a cLAN giganet switch as well.  The
>>>>cLAN hardware produces .5usec latench which is about 1000X better than any
>>>
>>>Bob, the latencies that i quote are RASML : Random Average Shared Memory
>>>Latencies.
>>>
>>>The latencies that you quote here are sequential latencies. Bandwidth divided by
>>>the number of seconds = latency (according to the manufacturers).
>>
>>No it isn't.  It is computed by _me_.  By randomly sending packets to different
>>nodes on this cluster and measuring the latency.  I'm not interested in any
>
>You need to ship a packet and then WAIT for it to get back. the simplest test is
>using 1 way pingpong. I will email you that program now.
>
>You will see about a 20-30 usec latency then.

Want to bet?  How about "the loser stops posting here?"


>
>>kind of bandwidth number.  I _know_ that is high.  It is high on a gigabit
>>ethernet switch.  I'm interested in the latency, how long does it take me to
>>get a packet from A to B, and there ethernet (including gigabit) is slow.
>
>>The cLAN with VIA is not.
>
>>IE on this particular cluster, it takes about 1/2 usec to get a short
>>packet from A to B.  The longer the packet, the longer the latency since I
>>assume that I need the last byte before I can use the first byte, which
>>might not always be true.
>
>Bob this is not one way ping pong latency. Not to mention that it isn't a full
>ship and receive.

So what.  Ping-pong is the _only_ way I know to measure latency.  I told you
that is what I did.  What is your problem with understanding that?

>
>In computerchess you don't ship something without waiting for answer back.
>You *want* answer back.
>
>Example if you want to split a node :)

Wrong.  It is not hard to do this.  I say "do this" and that is all I need
to do until I get the result back."  I don't need a "OK, I got that, I'll
be back with the answer in a while."  It is easier to just keep going until
the answer arrives back.

>
>The 0.5 usec latency is based upon shipping a terabyte data without answer back.

No it isn't.




>
>Bandwidth / time needed = latency then.
>
>What i tried to explain to you is RASML but i know you won't understand it.
>
>In order to waste time onto this i'll just email the thing to you.
>
>Run it any time you like, But run it on 2 different nodes. Don't run it at the
>same node :)


You sent me some MPI crap that I'm not going to fool with.  As I said, I
use VIA to use the cLAN stuff.  VIA.  Not MPI.

But I'm not going to waste timr running your crap anyway as whenever I do it,
and you don 't like the results, you just disappear for a while.



>
>>VIA has some cute stuff to "share memory" too.
>>
>>>
>>>For computer chess that can't be used however.
>>>
>>>You can more accurate get an indication by using the well known ping pong
>>>program. What it does is over MPI it ships messages and then WAITS for them to
>>>come back. Then it divides that time by 2. Then it is called one way ping pong
>>>latencies.
>>
>>That's how _I_ measure latency.  I know of no other way, since keepting two
>>machine clocks synced that accurately is not easy.
>>
>>
>>>
>>>If you multiply that by 2, you already get closer to the latency that it takes
>>>to get a single bitboard out of memory.
>>
>>It doesn't take me .5usec to get a bitboard out of memory.  Unless you are
>>talking about a NUMA machine where machine A wants the bitboard and it is
>>not in its local memory.
>>
>>
>>>
>>>Even better is using the RASML test i wrote. That's using OpenMP though but
>>>conversion to MPI is trivial (yet slowing down things so much that it is less
>>>accurate than openmp).
>>>
>>>So the best indication you can get is by doing a simple pingpong latency test.
>>
>>I do this all the time.
>>
>>>
>>>The best ethernet network cards are myrilnet work cards (about $1300). I do not
>>>know which chipset they have. They can achieve at 133Mhz PCI64X (jay might know
>>>more about specifications here) like 5 usec one way ping pong latency, so that's
>>>a minimum of way more than 10 usec to get a bitboard from the other side of th
>>>emachine.
>>
>>Correct.  cLAN is faster.  It is also more expensive.  The 8-port switch we
>>use cost us about $18,000 two years ago.  Myrinet was designed as a lower-cost
>>network.  With somewhat lower performance.
>>
>>>
>>>In your cluster you probably do not have such PCI stuff Bob. Most likely it is
>>>around 10 usec for one way latency at your cluster so you can get at minimum of
>>>20 usec to get a message.
>>
>>In my cluster I have PCI cards that are faster than Myrinet.  They were made by
>>cLAN (again) and we paid about $1,500 each for them two years ago.  Again, you
>>can find info about the cLAN stuff and compare it to myrinet if you want.  We
>>have Myrinet stuff here on campus (not in any of my labs) and we have done the
>>comparisons.  When we write proposals to NSF, they _always_ push us towards
>>Myrinet because it is cheaper then the cLAN stuff, but it also is lower
>>performance.
>>
>>
>>
>>>
>>>Note that getting a cache line out of local memory of your quad xeons is already
>>>taking about 0.5 usec. You can imagine hopefully that the quoted usecs by the
>>>manufacturer for cLan is based upon bandwidth / time needed. And NOT the RASM
>>>latencies.
>>
>>Your number there is dead wrong.  My cluster is PIII based, with a cache
>>line of 32 bytes.  It uses 4-way interleaving.  lm_bench reports the latency
>>as 132 nanoseconds, _total_.
>>
>>>
>>>Best regards,
>>>Vincent
>>>
>>>
>>>>TCP/IP-ethernet implementation.  However, ethernet will never touch good
>>>>hardware like the cLAN stuff.
>>>>
>>>>MPI/PVM use ethernet - tcp/ip for one obvious reason: "portability" and
>>>>"availability".  :)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.