Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: New intel 64 bit ?

Author: Vincent Diepeveen

Date: 21:23:52 07/08/03

Go up one level in this thread


On July 08, 2003 at 11:58:58, Robert Hyatt wrote:

>On July 08, 2003 at 08:49:48, Vincent Diepeveen wrote:
>
>>On July 07, 2003 at 10:48:02, Robert Hyatt wrote:
>>
>>>On July 05, 2003 at 23:37:47, Jay Urbanski wrote:
>>>
>>>>On July 04, 2003 at 23:33:46, Robert Hyatt wrote:
>>>>
>>>><snip>
>>>>>"way better than MPI".  Both use TCP/IP, just like PVM.  Except that MPI/OpenMP
>>>>>is designed for homogeneous clusters while PVM works with heterogeneous mixes.
>>>>>But for any of the above, the latency is caused by TCP/IP, _not_ the particular
>>>>>library being used.
>>>>
>>>>With latency a concern I don't know why you'd use TCP/IP as the transport for
>>>>MPI when there are much faster ones available.
>>>>
>>>>Even VIA over Ethernet would be an improvement.
>>>
>>>I use VIA over ethernet, and VIA over a cLAN giganet switch as well.  The
>>>cLAN hardware produces .5usec latench which is about 1000X better than any
>>
>>Bob, the latencies that i quote are RASML : Random Average Shared Memory
>>Latencies.
>>
>>The latencies that you quote here are sequential latencies. Bandwidth divided by
>>the number of seconds = latency (according to the manufacturers).
>
>No it isn't.  It is computed by _me_.  By randomly sending packets to different
>nodes on this cluster and measuring the latency.  I'm not interested in any

You need to ship a packet and then WAIT for it to get back. the simplest test is
using 1 way pingpong. I will email you that program now.

You will see about a 20-30 usec latency then.

>kind of bandwidth number.  I _know_ that is high.  It is high on a gigabit
>ethernet switch.  I'm interested in the latency, how long does it take me to
>get a packet from A to B, and there ethernet (including gigabit) is slow.

>The cLAN with VIA is not.

>IE on this particular cluster, it takes about 1/2 usec to get a short
>packet from A to B.  The longer the packet, the longer the latency since I
>assume that I need the last byte before I can use the first byte, which
>might not always be true.

Bob this is not one way ping pong latency. Not to mention that it isn't a full
ship and receive.

In computerchess you don't ship something without waiting for answer back.
You *want* answer back.

Example if you want to split a node :)

The 0.5 usec latency is based upon shipping a terabyte data without answer back.

Bandwidth / time needed = latency then.

What i tried to explain to you is RASML but i know you won't understand it.

In order to waste time onto this i'll just email the thing to you.

Run it any time you like, But run it on 2 different nodes. Don't run it at the
same node :)

>VIA has some cute stuff to "share memory" too.
>
>>
>>For computer chess that can't be used however.
>>
>>You can more accurate get an indication by using the well known ping pong
>>program. What it does is over MPI it ships messages and then WAITS for them to
>>come back. Then it divides that time by 2. Then it is called one way ping pong
>>latencies.
>
>That's how _I_ measure latency.  I know of no other way, since keepting two
>machine clocks synced that accurately is not easy.
>
>
>>
>>If you multiply that by 2, you already get closer to the latency that it takes
>>to get a single bitboard out of memory.
>
>It doesn't take me .5usec to get a bitboard out of memory.  Unless you are
>talking about a NUMA machine where machine A wants the bitboard and it is
>not in its local memory.
>
>
>>
>>Even better is using the RASML test i wrote. That's using OpenMP though but
>>conversion to MPI is trivial (yet slowing down things so much that it is less
>>accurate than openmp).
>>
>>So the best indication you can get is by doing a simple pingpong latency test.
>
>I do this all the time.
>
>>
>>The best ethernet network cards are myrilnet work cards (about $1300). I do not
>>know which chipset they have. They can achieve at 133Mhz PCI64X (jay might know
>>more about specifications here) like 5 usec one way ping pong latency, so that's
>>a minimum of way more than 10 usec to get a bitboard from the other side of th
>>emachine.
>
>Correct.  cLAN is faster.  It is also more expensive.  The 8-port switch we
>use cost us about $18,000 two years ago.  Myrinet was designed as a lower-cost
>network.  With somewhat lower performance.
>
>>
>>In your cluster you probably do not have such PCI stuff Bob. Most likely it is
>>around 10 usec for one way latency at your cluster so you can get at minimum of
>>20 usec to get a message.
>
>In my cluster I have PCI cards that are faster than Myrinet.  They were made by
>cLAN (again) and we paid about $1,500 each for them two years ago.  Again, you
>can find info about the cLAN stuff and compare it to myrinet if you want.  We
>have Myrinet stuff here on campus (not in any of my labs) and we have done the
>comparisons.  When we write proposals to NSF, they _always_ push us towards
>Myrinet because it is cheaper then the cLAN stuff, but it also is lower
>performance.
>
>
>
>>
>>Note that getting a cache line out of local memory of your quad xeons is already
>>taking about 0.5 usec. You can imagine hopefully that the quoted usecs by the
>>manufacturer for cLan is based upon bandwidth / time needed. And NOT the RASM
>>latencies.
>
>Your number there is dead wrong.  My cluster is PIII based, with a cache
>line of 32 bytes.  It uses 4-way interleaving.  lm_bench reports the latency
>as 132 nanoseconds, _total_.
>
>>
>>Best regards,
>>Vincent
>>
>>
>>>TCP/IP-ethernet implementation.  However, ethernet will never touch good
>>>hardware like the cLAN stuff.
>>>
>>>MPI/PVM use ethernet - tcp/ip for one obvious reason: "portability" and
>>>"availability".  :)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.