Author: Robert Hyatt
Date: 19:34:49 09/02/03
Go up one level in this thread
On September 02, 2003 at 17:48:39, Vincent Diepeveen wrote: >On September 02, 2003 at 11:00:13, Robert Hyatt wrote: > >>On September 01, 2003 at 06:09:48, Mridul Muralidharan wrote: >> >>>Hi Jeremiah, >>> >>> If you want crafty to get to work with a decent speedup on a 16 or 32 CPU >>>cc-numa where you have say 2 - 4 processors per node with a significant >>>inter-node latency (like most higher cpu numa boxes ?!) , you will have to >>>ensure that the way you split , memory usage , etc is optimal - you dont want to >>>access a hash entry in proc 0 from proc 32 when the latency wil be in >>>milliseconds !!! >>> >>>Hope you are appreciating the real world problems - not theoretical issues. >>>If you actually work on these boxes , you will appreciate the problems faced by >>>these developers more - using threads on such a box for parallelism , urggh !! >>> >>>Regards >>>Mridul >>> >>>PS : Forget getting crafty to work on the 500 CPU beast that Vincent is working >>>on without a total crafty rewrite ! The horrors Vincent must be facing is >>>unimaginable - the once he has already mentioned in this forum and I'm sure , >>>the more horrible ones he may not have ! >>> >> >>A total rewrite is _not_ needed. The search is already designed to work on >>_any_ type of parallel machine. The issue is allocating data structures on > >Wrong, it is needed. Wrong. It is _not_ needed. I've already done it once for the alpha NUMA box. I _know_ what I had to do. It wasn't that hard... > >No it isn't designed to run at NUMA machines with latencies in the microseconds. > >Note that your cluster when using myrilnetwork cards will have 10 us. Great. But I don't use that switch/card. I've told you that before. I have cLAN hardware. With .5usec latency. Any chance you will _ever_ get that? cLAN is _much_ more expensive than myrinet hardware. My 8-port cLAN switch cost me nearly $20,000. Each card was another $1,000. Myrinet is nowhere near that costly, nor nowhere near that fast. > >>the right processor. This will _not_ be a major change. I currently use an > >I will save this posting for sure :) > >Been working a year fulltime now :) > So? It took you over a year to get your parallel search working. It took me weeks. :) >>array of split blocks. What I need is an array of pointers to split blocks, >>so that each processor can allocate a few split blocks in its local memory. >>Then the block allocator simply has to prefer split blocks in the processor >>that will be using them, when trying to allocate a split block for a parallel >>thread to use. > >>It is something I have on my list of things to do, but the real issue is >>"how to allocate _local_ memory" reliably, without wrecking things on non- >>NUMA machines? >> >>That's why I haven't looked at this lately. I looked at it a year ago on a >>NUMA alpha box, but unfortunately the code was lost when the disk on that >>machine crashed with no backups. I got a new disk, but the source changes >>were lost. This was written around Compaq's UPC compiler.. > >You still don't have a clue what writing software for NUMA is, when that >hardware has latencies in the microseconds range. Vincent, I know so much more about parallel computing than you do, I really don't know where to start in discussing it with you. I _do_ understand microsecond latency issues. I _do_ understand millisecond latency issues. I have done both distributed _and_ NUMA-based applications. Get off your nonsensical "I know more than you do" high-horse. You look like an idiot. > > > >>>On August 30, 2003 at 10:40:03, Vincent Diepeveen wrote: >>> >>>>On August 29, 2003 at 23:41:32, Jeremiah Penery wrote: >>>> >>>>>On August 29, 2003 at 18:40:23, Mridul Muralidharan wrote: >>>>> >>>>>>I'm not sure of what/why Prof. Bob Hyatt may have made those comments. But to >>>>>>get a program like crafty to work properly in a numa machine will not be trivial >>>>>>- and it wont be tweaks , but something more. >>>>> >>>>>All multi-CPU Opteron machines are NUMA. Crafty will work just fine in those. >>>>>It will not be theoretically optimal, but that also depends on the OS to help >>>>>with NUMA issues. >>>> >>>>The OS has to do very little for chessprograms. Just keep scheduling the same >>>>process at the same cpu and physically allocating local memory at that cpu's >>>>RAM. >>>> >>>>Of course for a lot of other services the OS has to do a lot different, yet in >>>>chessprograms we do not need it as most of us, except for example cilkchess, >>>>write their parallellism at a very low level. >>>> >>>>>>Duals , etc count as SMP machine not cc-numa which I was refering to. >>>>>Dual Opterons are NUMA. >>>> >>>>And soon all duals that we can afford will be. >>>> >>>>Best regards, >>>>Vincent
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.