Author: Robert Hyatt
Date: 20:33:46 07/04/03
Go up one level in this thread
On July 03, 2003 at 20:13:06, Vincent Diepeveen wrote: >On July 03, 2003 at 18:15:01, Robert Hyatt wrote: > >>On July 03, 2003 at 16:50:29, Chris Hull wrote: >> >>>On July 03, 2003 at 13:03:13, Robert Hyatt wrote: >>> >>>>On July 03, 2003 at 05:51:51, Russell Reagan wrote: >>>> >>>>>On July 03, 2003 at 05:31:15, Tony Werten wrote: >>>>> >>>>>>http://www.digitimes.com/NewsShow/Article.asp?datePublish=2003/07/01&pages=02&seq=3 >>>>>> >>>>>>Tony >>>>> >>>>>Interesting news. Some things the article says makes me think this is nothing to >>>>>get excited about. >>>>> >>>>>"targeting the high-priced, back-end server market" - This makes me think >>>>>"nothing new here, the Itanium has been out of the price range of everyone for >>>>>years anyway." I can't imagine them competing with the Opteron (much less >>>>>Athlon64) if they can't come way down in price. >>>>> >>>>>It says something about a lower end cpu for workstations, but the way they put >>>>>it (maybe it's just the writer), it makes it sound (to me) that the high-end >>>>>Itanium will still be significantly more than the Opteron, and the low-end >>>>>Itanium will still be significantly more than the Athlon64, and that the >>>>>really-low-end Xeon might be in the price range of the Opteron. >>>>> >>>>> >>>>>"Intel servers containing eight to 128 Itanium processors..." >>>>> >>>>>So Bob, what is the expected speedup of Crafty on a 128-Itanium machine? :) >>>> >>>> >>>>Hard to say since it is a NUMA type machine. There are lots of issues >>>>there. >>> >>>Ok, this begs the question, "Can crafty be made to work on a NUMA-type cluster? >>>How about in a messaging passing cluster using PVM or MPI?" Not just made to >>>work but to actually see SMP like speedups, for 4/8/16/32/64 node clusters. >>> >>>Chris >> >> >>The answer to both is "yes". >> >>NUMA is a problem, but it is solvable. The problem is that the current >>way of allocating "split blocks" is not good for NUMA machines. A NUMA >>machine _really_ wants its often-accessed data to be in its local memory, >>and I don't have any way of forcing that at the moment. It would not be >>terribly difficult to change it, by allocating a bunch of split blocks on >>each CPU/local-memory, and then ensuring that the right split block is used >>for the right processor. But on an SMP box, this is moot so it was not done >>in the original design. >> >>Clustering is harder, since suddenly there is no shared memory at all, > >He asked MPI library that *means* not using shared memory. > I believe I said that. >Also all those itanium things are sold as 'clusters'. > >Latency from good old origin3800 with MPI even is way better than from the new >SGI Altix3000 with Madisons and MPI. > >That's weird because the design looks ok to me. > >Then please consider that this altix is kicking butt compared to other itanium >clusters at that too much praised TPC bench. > >>which changes both the overall structure of the program as well as the >>underlying assumption that "it is easy to do a quick parallel search and >>get a result back" because network latency suddenly turns something quick >>into something with a significant latency. >> >>SMP-like speedups are likely not possible for chess, because of the way the >>alpha/beta algorithm is built around sequential searching. But reasonable >>speedup is definitely possible. Who cares if 1000 processors is only 100X >>faster. 100X is _way_ faster. > >How to get 100x faster out of 1000 cpu's with MPI? > >Please tell me. It is not so trivial. I don't know whether it is trivial or not, I have not yet tried it with chess. But getting 10% efficiency seems to be doable based on results over the years by Schaeffer. 10% is lousy for SMP, particularly when SMP doesn't offer many CPUS. But 10% would be acceptable for a cluster since it can be arbitrarily large. > >I got 500 processors with OpenMP, way better than MPI. But still very hard to >get a good speedup with. Working a year nearly already. It is *not* trivial. Better look at how "OpenMP" is implemented before saying that. And it isn't "way better than MPI". Both use TCP/IP, just like PVM. Except that MPI/OpenMP is designed for homogeneous clusters while PVM works with heterogeneous mixes. But for any of the above, the latency is caused by TCP/IP, _not_ the particular library being used. There are alternatives, such as those used on NUMA clusters, where TCP/IP is not used, and the backplane of the architecture is used to transport the messages. But if OpenMP beats MPI it only means that someone hasn't ported MPI very well on the platform you are using. Or it may mean OpenMP is using the hardware you have while MPI is still using TCP/IP even if it goes over the backplane. > >Best regards, >Vincent
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.