Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: New intel 64 bit ?

Author: Robert Hyatt

Date: 08:58:58 07/08/03

Go up one level in this thread


On July 08, 2003 at 08:49:48, Vincent Diepeveen wrote:

>On July 07, 2003 at 10:48:02, Robert Hyatt wrote:
>
>>On July 05, 2003 at 23:37:47, Jay Urbanski wrote:
>>
>>>On July 04, 2003 at 23:33:46, Robert Hyatt wrote:
>>>
>>><snip>
>>>>"way better than MPI".  Both use TCP/IP, just like PVM.  Except that MPI/OpenMP
>>>>is designed for homogeneous clusters while PVM works with heterogeneous mixes.
>>>>But for any of the above, the latency is caused by TCP/IP, _not_ the particular
>>>>library being used.
>>>
>>>With latency a concern I don't know why you'd use TCP/IP as the transport for
>>>MPI when there are much faster ones available.
>>>
>>>Even VIA over Ethernet would be an improvement.
>>
>>I use VIA over ethernet, and VIA over a cLAN giganet switch as well.  The
>>cLAN hardware produces .5usec latench which is about 1000X better than any
>
>Bob, the latencies that i quote are RASML : Random Average Shared Memory
>Latencies.
>
>The latencies that you quote here are sequential latencies. Bandwidth divided by
>the number of seconds = latency (according to the manufacturers).

No it isn't.  It is computed by _me_.  By randomly sending packets to different
nodes on this cluster and measuring the latency.  I'm not interested in any
kind of bandwidth number.  I _know_ that is high.  It is high on a gigabit
ethernet switch.  I'm interested in the latency, how long does it take me to
get a packet from A to B, and there ethernet (including gigabit) is slow.

The cLAN with VIA is not.

IE on this particular cluster, it takes about 1/2 usec to get a short
packet from A to B.  The longer the packet, the longer the latency since I
assume that I need the last byte before I can use the first byte, which
might not always be true.

VIA has some cute stuff to "share memory" too.

>
>For computer chess that can't be used however.
>
>You can more accurate get an indication by using the well known ping pong
>program. What it does is over MPI it ships messages and then WAITS for them to
>come back. Then it divides that time by 2. Then it is called one way ping pong
>latencies.

That's how _I_ measure latency.  I know of no other way, since keepting two
machine clocks synced that accurately is not easy.


>
>If you multiply that by 2, you already get closer to the latency that it takes
>to get a single bitboard out of memory.

It doesn't take me .5usec to get a bitboard out of memory.  Unless you are
talking about a NUMA machine where machine A wants the bitboard and it is
not in its local memory.


>
>Even better is using the RASML test i wrote. That's using OpenMP though but
>conversion to MPI is trivial (yet slowing down things so much that it is less
>accurate than openmp).
>
>So the best indication you can get is by doing a simple pingpong latency test.

I do this all the time.

>
>The best ethernet network cards are myrilnet work cards (about $1300). I do not
>know which chipset they have. They can achieve at 133Mhz PCI64X (jay might know
>more about specifications here) like 5 usec one way ping pong latency, so that's
>a minimum of way more than 10 usec to get a bitboard from the other side of th
>emachine.

Correct.  cLAN is faster.  It is also more expensive.  The 8-port switch we
use cost us about $18,000 two years ago.  Myrinet was designed as a lower-cost
network.  With somewhat lower performance.

>
>In your cluster you probably do not have such PCI stuff Bob. Most likely it is
>around 10 usec for one way latency at your cluster so you can get at minimum of
>20 usec to get a message.

In my cluster I have PCI cards that are faster than Myrinet.  They were made by
cLAN (again) and we paid about $1,500 each for them two years ago.  Again, you
can find info about the cLAN stuff and compare it to myrinet if you want.  We
have Myrinet stuff here on campus (not in any of my labs) and we have done the
comparisons.  When we write proposals to NSF, they _always_ push us towards
Myrinet because it is cheaper then the cLAN stuff, but it also is lower
performance.



>
>Note that getting a cache line out of local memory of your quad xeons is already
>taking about 0.5 usec. You can imagine hopefully that the quoted usecs by the
>manufacturer for cLan is based upon bandwidth / time needed. And NOT the RASM
>latencies.

Your number there is dead wrong.  My cluster is PIII based, with a cache
line of 32 bytes.  It uses 4-way interleaving.  lm_bench reports the latency
as 132 nanoseconds, _total_.

>
>Best regards,
>Vincent
>
>
>>TCP/IP-ethernet implementation.  However, ethernet will never touch good
>>hardware like the cLAN stuff.
>>
>>MPI/PVM use ethernet - tcp/ip for one obvious reason: "portability" and
>>"availability".  :)



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.