Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Correction hydra hardware

Author: Vincent Diepeveen

Date: 10:26:04 02/01/05

On February 01, 2005 at 12:59:31, Robert Hyatt wrote:

>On February 01, 2005 at 10:59:18, Vincent Diepeveen wrote:
>
>>On February 01, 2005 at 00:56:17, Robert Hyatt wrote:
>>
>>>On January 31, 2005 at 13:35:02, Vincent Diepeveen wrote:
>>>
>>>>On January 31, 2005 at 13:03:43, Robert Hyatt wrote:
>>>>
>>>>>On January 31, 2005 at 10:14:28, Vincent Diepeveen wrote:
>>>>>
>>>>>>On January 31, 2005 at 10:01:16, Vincent Lejeune wrote:
>>>>>>
>>>>>>>
>>>>>>>news from 28/01/05 (more to come)
>>>>>>>
>>>>>>>http://hydrachess.com/hydra-scylla.html
>>>>>>>
>>>>>>>32 nodes (previous version has 16) , no information about CPU power and FPGA
>>>>>>>cards yet ...
>>>>>>
>>>>>>It's 4 nodes.
>>>>>>
>>>>>>1 node = 8 processor Xeon.
>>>>>>
>>>>>>FPGA cards would get double speed. So must be between 30Mhz and 60Mhz. They only
>>>>>>use development fpga cards. So never use the real power of fpga (which is
>>>>>>printing your own processor which can run hands down at 600Mhz or more). They
>>>>>>stick to development cards for some unknown reason to me.
>>>>>>
>>>>>>CPU power is not interesting at all of course, cards do the work.
>>>>>>
>>>>>>Vincent
>>>>>
>>>>>I hope not.  Old machine used 8 boxes with 2 cpus per box.  Going to 8-way xeons
>>>>>is a performance killer.  The PCI bus just can't keep up.
>>>>
>>>>Ok the sheikh nor his righthand didn't know himself very well the architectural
>>>>details which we will forgive them.
>>>>
>>>>I have accurate information now.
>>>>
>>>>It is a 32 node system. With each node a dual. They have however 32 FPGA cards
>>>>because when ordering the person filling in the form (and i am NOT going to post
>>>>who it was, but it was NOT the sheikh) confused nodes for cpu's.
>>>>
>>>>So they have a mighty 32 node myrinet now with 64 processors. However 32 cards
>>>>so they run 64 processor effectively while serving with 32 cards which do the
>>>>job. Cards at 55Mhz.
>>>>
>>>>Please note that PCI bus isn't the problem. They are using pci-x.
>>>
>>>PCI-X falls flat if you have 8 cpus in a single box.  I have run on such
>>
>>It's not the pci-x which is the problem at all.
>>
>>It's that it is simply tough programming to get it to work.
>
>No it isn't.  One bus to memory, 8 processors hanging on the bus trying to get
>to memory.  They get in the way of each other, and that is why bus architectures
>don't scale very well beyond 4.  And even going to 4 requires 4-way interleaving
>to keep up.  But they don't go to 8-way interleaving on the Dell-type boxes.
>Others do but they have a price that shows it...
>
>For 8 and up, a crossbar is really the right way to go, if the price can be
>handled.  Otherwise a NUMA-type approach like the AMD solution is most
>affordable.
>
>>
>>You simply must limit any read or write remote to the ultimate maximum.
>>
>>Multithreading, forget it.
>
>multi-threading or multiple processes is not the issue here.  What you can do
>with one, you can do with the other.  One just offers easier-to-use features for
>certain applications.
>
>
>>
>>>machines.  PCI does pretty well on 4way systems, but on 8-way, overall gain
>>
>>PCI is at least 4 times slower than pci-x in latency.
>>
>>pci-x can give easily 1 us if the network card is fast enough.
>>
>
>We are not talking about network.  We are talking (at least I am talking) about
>SMP-type boxes with 8 cpus in a single chassis, like the 8-cpu xeon Dell sells,
>or like the 4-cpu xeon boxes I have here...

I'm not talking about 8 way smp boxes. Those are more expensive than a 32
processor cluster when you get latest processors inside.

>
>
>>practical even cheapo myrinet gives 2.7 us.
>>
>>>seems to be 1.5x a 4-way which is not that great.  If you run a program that
>>>runs out of cache quickly, this drops even further.
>>
>>you can't run smp programs over networks obviously.
>
>That's why I wasn't talking about networks.  You originally said this machine
>(New Hydra) has a node with 8 processors.  That is what I am talking about.

Perhaps in future read the subject

>
>
>>
>>>
>>>>
>>>>Latency of one way pingpong is around 2.7 us with myrinet. That excludes the
>>>>router costs which will be also at about 1 us i guess for random data traffic
>>>>(up to 35 ns for bandwidth traffic).
>>>>
>>>>Vincent
>>>
>>>
>>>All depends.  We have myrinet here and are probably going to use that in our new
>>>opteron cluster when we buy it after the dual-core opterons start shipping in
>>>quantity...
>>
>>For chess myrinet sucks ass to say it very polite, because it doesn't allow DSM
>>(distributed shared memory).
>>
>>For just a few dollar more you can get quadrics or dolphin which do have better
>>latencies (dolphin 1 us) and allow distributed shared memory.
>>
>>The real major problem with myrinet is that the receiving process must non stop
>>receive the messages and process them. So you must make some kind of hand timing
>>within the search process to do just that.
>>
>>With DSM your processes don't feel any of that.
>>
>>A 8 node quadrics network is 13095 dollar. That includes everything.
>>
>>Quadrics is used in the fastest supercomputers, like the nuclear supercomputer
>>France just ordered a while ago. It scales far superior to myrinet when you
>>start scaling above those 8 nodes.
>>
>>For chess using the DSM features in a program is not so trivial, but pretty easy
>>compared to the task of parallellizing a product.
>>
>>Vincent
>
>
>Just remember that most supercomputer applications don't care about latency,

Just remember that i don't build a cluster for a matrix calculation, but for
DIEP :)

In which case myrinet sucks :)

>they care about bandwidth.  Large applications are all about streaming data,
the
>latency for the first word is not important when several million are going to
>follow back-to-back.  All that matters to the big applications is how frequently
>do I get the next word, the latency for the first word gets buried in the cost
>to transfer the remaining millions of words.  That's what makes vector computers
>so powerful for the right kinds of applications, as opposed to these "toy
>supercomputers" that just use lots of general purpose processors and sloppy
>interconnections.

Majority of supercomputers are not vector computers. Latency is important also
at supercomputers. Majority of jobs run at supercomputers are 4-8 processors and
eat half the system time a year. The other half is matrix calculations and
trivially could run at cheapo clusters.

Only a few applications are really optimized and i wonder why they don't run
those on clusters but use very expensive SGI type hardware for it which delivers
very little flops per dollar.

Vincent

Re: Correction hydra hardware Robert Hyatt 13:28:22 02/01/05
- Re: Correction hydra hardware Vincent Diepeveen 14:19:26 02/01/05
  - Re: Correction hydra hardware Robert Hyatt 18:51:19 02/01/05
    - Re: Correction hydra hardware Vincent Diepeveen 06:53:03 02/02/05
      - Re: Correction hydra hardware Robert Hyatt 08:54:46 02/02/05
        
        Re: Correction hydra hardware Vincent Diepeveen 08:17:31 02/03/05
        
        Re: Correction hydra hardware Robert Hyatt 08:59:11 02/03/05
        
        Re: Correction hydra hardware Vincent Diepeveen 15:57:19 02/03/05
        
        Re: Correction hydra hardware Robert Hyatt 20:08:28 02/03/05
        
        Re: Correction hydra hardware Matthew Hull 10:20:58 02/03/05
        
        Re: Correction hydra hardware Robert Hyatt 11:02:56 02/03/05
        
        Re: Correction hydra hardware Robert Hyatt 08:57:54 02/03/05
      - Re: Correction hydra hardware Vincent Diepeveen 06:56:15 02/02/05
- Re: Correction hydra hardware Matthew Hull 14:04:42 02/01/05
  - Re: Correction hydra hardware Robert Hyatt 18:53:33 02/01/05
    - Re: Correction hydra hardware Matthew Hull 21:03:26 02/01/05
      - Re: Correction hydra hardware Robert Hyatt 22:12:37 02/01/05

This page took 0.04 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.