Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty and NUMA

Author: Vincent Diepeveen

Date: 14:48:39 09/02/03

Go up one level in this thread


On September 02, 2003 at 11:00:13, Robert Hyatt wrote:

>On September 01, 2003 at 06:09:48, Mridul Muralidharan wrote:
>
>>Hi Jeremiah,
>>
>>  If you want crafty to get to work with a decent speedup on a 16 or 32 CPU
>>cc-numa where you have say 2 - 4 processors per node with a significant
>>inter-node latency (like most higher cpu numa boxes ?!)  , you will have to
>>ensure that the way you split , memory usage , etc is optimal - you dont want to
>>access a hash entry in proc 0 from proc 32 when the latency wil be in
>>milliseconds !!!
>>
>>Hope you are appreciating the real world problems - not theoretical issues.
>>If you actually work on these boxes , you will appreciate the problems faced by
>>these developers more - using threads on such a box for parallelism , urggh !!
>>
>>Regards
>>Mridul
>>
>>PS : Forget getting crafty to work on the 500 CPU beast that Vincent is working
>>on without a total crafty rewrite ! The horrors Vincent must be facing is
>>unimaginable - the once he has already mentioned in this forum and I'm sure ,
>>the more horrible ones he may not have !
>>
>
>A total rewrite is _not_ needed.  The search is already designed to work on
>_any_ type of parallel machine.  The issue is allocating data structures on

Wrong, it is needed.

No it isn't designed to run at NUMA machines with latencies in the microseconds.

Note that your cluster when using myrilnetwork cards will have 10 us.

>the right processor.  This will _not_ be a major change.  I currently use an

I will save this posting for sure :)

Been working a year fulltime now :)

>array of split blocks.  What I need is an array of pointers to split blocks,
>so that each processor can allocate a few split blocks in its local memory.
>Then the block allocator simply has to prefer split blocks in the processor
>that will be using them, when trying to allocate a split block for a parallel
>thread to use.

>It is something I have on my list of things to do, but the real issue is
>"how to allocate _local_ memory" reliably, without wrecking things on non-
>NUMA machines?
>
>That's why I haven't looked at this lately.  I looked at it a year ago on a
>NUMA alpha box, but unfortunately the code was lost when the disk on that
>machine crashed with no backups.  I got a new disk, but the source changes
>were lost.  This was written around Compaq's UPC compiler..

You still don't have a clue what writing software for NUMA is, when that
hardware has latencies in the microseconds range.



>>On August 30, 2003 at 10:40:03, Vincent Diepeveen wrote:
>>
>>>On August 29, 2003 at 23:41:32, Jeremiah Penery wrote:
>>>
>>>>On August 29, 2003 at 18:40:23, Mridul Muralidharan wrote:
>>>>
>>>>>I'm not sure of what/why Prof. Bob Hyatt may have made those comments. But to
>>>>>get a program like crafty to work properly in a numa machine will not be trivial
>>>>>- and it wont be tweaks , but something more.
>>>>
>>>>All multi-CPU Opteron machines are NUMA.  Crafty will work just fine in those.
>>>>It will not be theoretically optimal, but that also depends on the OS to help
>>>>with NUMA issues.
>>>
>>>The OS has to do very little for chessprograms. Just keep scheduling the same
>>>process at the same cpu and physically allocating local memory at that cpu's
>>>RAM.
>>>
>>>Of course for a lot of other services the OS has to do a lot different, yet in
>>>chessprograms we do not need it as most of us, except for example cilkchess,
>>>write their parallellism at a very low level.
>>>
>>>>>Duals , etc count as SMP machine not cc-numa which I was refering to.
>>>>Dual Opterons are NUMA.
>>>
>>>And soon all duals that we can afford will be.
>>>
>>>Best regards,
>>>Vincent



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.