Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty and NUMA

Author: Vincent Diepeveen

Date: 09:16:11 09/03/03

Go up one level in this thread


On September 03, 2003 at 12:00:58, Robert Hyatt wrote:

>On September 03, 2003 at 09:26:31, Vincent Diepeveen wrote:
>
>>On September 02, 2003 at 18:37:05, Jeremiah Penery wrote:
>>
>>>On September 02, 2003 at 07:15:55, Mridul Muralidharan wrote:
>>>
>>>>
>>>><snip>
>>>>On September 01, 2003 at 09:39:55, Jeremiah Penery wrote:
>>>>
>>>>>
>>>>>Any large (multi-node) SMP machine will have the same problem as NUMA with
>>>>>respect to inter-node latency.  SMP doesn't magically make node-to-node
>>>>>communication any faster.
>>>>
>>>>Pardon my saying so , but it looks like you have very little idea about SMP and
>>>>NUMA.
>>>
>>>If I didn't have some idea what I was talking about, I wouldn't be talking,
>>>unlike a lot of people in these discussions.
>>>
>>>> Refer to cray architecture , an opteron 8 way box architecture , and some
>>>>IBM supercomp cc-NUMA based system architecture docs for more info. I'm not
>>>
>>>Those machines are designed and built for *completely* different purposes.  You
>>>might as well compare the documentation for a P4 to that of an UltraSPARC, for
>>>all the good it would do you.
>>>
>>>>refering to just theoretical differences , or _only_ architecture differences -
>>>>but as a programmer - what details that need to be taken care of while writing
>>>>apps for such a system.
>>>
>>>And those details would be what, other than the aforementioned theoretical or
>>>architectural differences?
>>>
>>>>>But in reality, almost nobody uses a machine that big, especially for chess.
>>>>
>>>>The question was - can it be done , is it just a bunch of tweaks - not do you
>>>>have a system.
>>>>Answer : Yes it cn be done , needs lots of rewrite - not just "tweaks".
>>>
>>>Not really.  Bob said he already completed the changes, and it didn't really
>>>involve much.  Only instead of forking processes he had to manually start
>>>processes on each processor.  That really doesn't take much work.
>>
>>But that's of course not true.
>>
>>Why are you believing this nonsense?
>
>Would you like a name at Compaq?  They sent me an alpha, and a NDA copy of
>their UPC compiler to do this work.  I didn't publish anything due to the NDA
>of course, but that has lapsed and the compiler is now commercially available.
>
>Why do _you_ write this nonsense???

So
  a) you either lose your source code when you achieve something important
     (a numa version of crafty)
  b) you can't find even a cray executable of cray blitz when someone
     offered to rerun your tests to verify your speedup numbers
  c) you signed NDA that proofs that crafty runs well at cc-NUMA machines
     with microseconds latencies if i understand well here

So stop the nonsense here Bob.

Show outputs of crafty at cc-NUMA machines with random latencies > 1 microsecond
or a worst case one way pingpong time than 500 ns.

It never worked at them of course and never will.

There is very little concrete outputs of crafty > 4 cpu's. All i remember is
some very expensive 16 processor alpha which ran crafty shortly.

But that machine is not even remote to the latencies that real cc-NUMA machines
heve. If you mail me a version of crafty that compiles at a R14000 i can do a
few runs for you at 16 cpu's or whatever number you want up to 130 without
problems. 500 even.

But nalimov's stuff doesn't compile at the mipspro compiler. Dunno why.

I have run other software than DIEP too at up to 16 cpu's and they all do better
than Crafty.

>
>>
>>>>>For any but the most extremely scalable architectures, there is significant
>>>>>diminishing returns when adding processors for chess playing.  I'd say that a
>>>>>very scalable 8-way SMP or NUMA (Opteron) machine will not be very much slower
>>>>>than even a 64-way Alpha/Itanium/xxx machine for chess.
>>>>
>>>>If badly programmed , then yes not much difference between a 8 proc box and a 64
>>>>proc box (actually it can be lower performing!).
>>>>Which is exactly my point , you need to design a program specifically to run on
>>>>such a system - not expect something that works on a 2 or 4 proc system and
>>>>expect it to work for a 64 proc system !
>>>
>>>The Alpha-Beta algorithm used for chess is a serial algorithm.  There's no
>>>getting around that.  The more processors you use, the less efficiency you will
>>>get, unless you use something else than Alpha-Beta.
>>>
>>>No matter how much you want to rewrite and "tweak" for a NUMA machine (or any
>>>kind of machine, for that matter), adding more and more processors is simply
>>>going to stop being beneficial at some point.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.