Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty and NUMA

Author: Robert Hyatt

Date: 19:34:49 09/02/03

Go up one level in this thread


On September 02, 2003 at 17:48:39, Vincent Diepeveen wrote:

>On September 02, 2003 at 11:00:13, Robert Hyatt wrote:
>
>>On September 01, 2003 at 06:09:48, Mridul Muralidharan wrote:
>>
>>>Hi Jeremiah,
>>>
>>>  If you want crafty to get to work with a decent speedup on a 16 or 32 CPU
>>>cc-numa where you have say 2 - 4 processors per node with a significant
>>>inter-node latency (like most higher cpu numa boxes ?!)  , you will have to
>>>ensure that the way you split , memory usage , etc is optimal - you dont want to
>>>access a hash entry in proc 0 from proc 32 when the latency wil be in
>>>milliseconds !!!
>>>
>>>Hope you are appreciating the real world problems - not theoretical issues.
>>>If you actually work on these boxes , you will appreciate the problems faced by
>>>these developers more - using threads on such a box for parallelism , urggh !!
>>>
>>>Regards
>>>Mridul
>>>
>>>PS : Forget getting crafty to work on the 500 CPU beast that Vincent is working
>>>on without a total crafty rewrite ! The horrors Vincent must be facing is
>>>unimaginable - the once he has already mentioned in this forum and I'm sure ,
>>>the more horrible ones he may not have !
>>>
>>
>>A total rewrite is _not_ needed.  The search is already designed to work on
>>_any_ type of parallel machine.  The issue is allocating data structures on
>
>Wrong, it is needed.

Wrong.  It is _not_ needed.  I've already done it once for the alpha NUMA
box.  I _know_ what I had to do.  It wasn't that hard...


>
>No it isn't designed to run at NUMA machines with latencies in the microseconds.
>
>Note that your cluster when using myrilnetwork cards will have 10 us.

Great.  But I don't use that switch/card.  I've told you that before.  I
have cLAN hardware.  With .5usec latency.  Any chance you will _ever_ get
that?  cLAN is _much_ more expensive than myrinet hardware.  My 8-port
cLAN switch cost me nearly $20,000.  Each card was another $1,000.  Myrinet
is nowhere near that costly, nor nowhere near that fast.


>
>>the right processor.  This will _not_ be a major change.  I currently use an
>
>I will save this posting for sure :)
>
>Been working a year fulltime now :)
>

So?  It took you over a year to get your parallel search working.  It took
me weeks.

:)




>>array of split blocks.  What I need is an array of pointers to split blocks,
>>so that each processor can allocate a few split blocks in its local memory.
>>Then the block allocator simply has to prefer split blocks in the processor
>>that will be using them, when trying to allocate a split block for a parallel
>>thread to use.
>
>>It is something I have on my list of things to do, but the real issue is
>>"how to allocate _local_ memory" reliably, without wrecking things on non-
>>NUMA machines?
>>
>>That's why I haven't looked at this lately.  I looked at it a year ago on a
>>NUMA alpha box, but unfortunately the code was lost when the disk on that
>>machine crashed with no backups.  I got a new disk, but the source changes
>>were lost.  This was written around Compaq's UPC compiler..
>
>You still don't have a clue what writing software for NUMA is, when that
>hardware has latencies in the microseconds range.

Vincent, I know so much more about parallel computing than you do, I really
don't know where to start in discussing it with you.  I _do_ understand
microsecond latency issues.  I _do_ understand millisecond latency issues.
I have done both distributed _and_ NUMA-based applications.

Get off your nonsensical "I know more than you do" high-horse.  You look like
an idiot.


>
>
>
>>>On August 30, 2003 at 10:40:03, Vincent Diepeveen wrote:
>>>
>>>>On August 29, 2003 at 23:41:32, Jeremiah Penery wrote:
>>>>
>>>>>On August 29, 2003 at 18:40:23, Mridul Muralidharan wrote:
>>>>>
>>>>>>I'm not sure of what/why Prof. Bob Hyatt may have made those comments. But to
>>>>>>get a program like crafty to work properly in a numa machine will not be trivial
>>>>>>- and it wont be tweaks , but something more.
>>>>>
>>>>>All multi-CPU Opteron machines are NUMA.  Crafty will work just fine in those.
>>>>>It will not be theoretically optimal, but that also depends on the OS to help
>>>>>with NUMA issues.
>>>>
>>>>The OS has to do very little for chessprograms. Just keep scheduling the same
>>>>process at the same cpu and physically allocating local memory at that cpu's
>>>>RAM.
>>>>
>>>>Of course for a lot of other services the OS has to do a lot different, yet in
>>>>chessprograms we do not need it as most of us, except for example cilkchess,
>>>>write their parallellism at a very low level.
>>>>
>>>>>>Duals , etc count as SMP machine not cc-numa which I was refering to.
>>>>>Dual Opterons are NUMA.
>>>>
>>>>And soon all duals that we can afford will be.
>>>>
>>>>Best regards,
>>>>Vincent



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.