Author: Robert Hyatt
Date: 10:42:08 09/03/03
Go up one level in this thread
On September 03, 2003 at 12:17:45, Vincent Diepeveen wrote: >On September 03, 2003 at 12:04:01, Robert Hyatt wrote: > >>On September 03, 2003 at 05:13:07, Mridul Muralidharan wrote: >> >>>Hi, >>> >>> >>>On September 02, 2003 at 18:37:05, Jeremiah Penery wrote: >>> >>>>On September 02, 2003 at 07:15:55, Mridul Muralidharan wrote: >>>> >>><snip> >>>> >>>>If I didn't have some idea what I was talking about, I wouldn't be talking, >>>>unlike a lot of people in these discussions. >>>> >>> >>>This should give you some direction to think about : >>>http://www.talkchess.com/forums/1/message.html?313791 >>> >>>Actually , more references could be give - but considering your set mindset, no >>>thanks :) >>> >>>>> Refer to cray architecture , an opteron 8 way box architecture , and some >>>>>IBM supercomp cc-NUMA based system architecture docs for more info. I'm not >>>> >>>>Those machines are designed and built for *completely* different purposes. You >>>>might as well compare the documentation for a P4 to that of an UltraSPARC, for >>>>all the good it would do you. >>>> >>> >>>If you say a cc-NUMA is built for a entirely different purpose - definitely , I >>>agree with you ! Like Bob Hyatt mentions in the above mentioned post - >>>performace / price / scalability matrix works quiet well for NUMA at higher >>>number of processors. >>>Which is exactly what I said - no point in saying crafty (or any other program >>>for that matter) will scale well on a 16 or 64 proc NUMA box just 'cos it scales >>>well on a 4 proc smp box. >>>NUMA machines are a slightly different breed. >>> >>>>>refering to just theoretical differences , or _only_ architecture differences - >>>>>but as a programmer - what details that need to be taken care of while writing >>>>>apps for such a system. >>>> >>>>And those details would be what, other than the aforementioned theoretical or >>>>architectural differences? >>>> >>> >>>Quiet simple - on a smp or cray box , typically you do not care much about >>>latency for accessing memory being different for different processors , etc. As >>>a programmer , you have to be aware of all these. >>>Why do you think Linux on numa sucks ass ?!!! >>>Also , depending on how the box is configured , number of procs per node , etc - >>>you memory management , thread/process splitting , etc (for a chess program that >>>is) will have to be modified. >>>Just because you might know what the architecture of the box is , does not imply >>>that you will come up with a program which scales well on NUMA ! >>> >>> >>>>>>But in reality, almost nobody uses a machine that big, especially for chess. >>>>> >>>>>The question was - can it be done , is it just a bunch of tweaks - not do you >>>>>have a system. >>>>>Answer : Yes it cn be done , needs lots of rewrite - not just "tweaks". >>>> >>>>Not really. Bob said he already completed the changes, and it didn't really >>>>involve much. Only instead of forking processes he had to manually start >>>>processes on each processor. That really doesn't take much work. >>>> >>> >>> >>>If it was just a bunch of tweaks that you mention here - I would love to see how >>>much performance it will give on a 64/128 proc NUMA box :) >>>I can make a guess - it will suck a**. (No offence to anyone here) >> >>You are mixing apples and oranges. How will it do on a 128 node SMP >>box? How will it do on a 128 node NUMA box? _both_ will not do very well >>since things are not tuned for that many processors. However, the original >>NUMA port did pretty well on a 32 CPU box. Not as well as it would have done >>on a 32 CPU SMP box however. But then NUMA won't _ever_ produce the same >>level of performance as pure SMP boxes will. They are just much more >>affordable. > >Bob, show that 32 cpu output and crafty version number with it. > >Thanks, >Vincent > I could do that for a couple of test positions I happen to have saved the log files from. What would it prove? That version of Crafty doesn't exist any longer. It was never released as it was a "work in progress" for the Compaq UPC compiler that was not going to work on other platforms. You'd say the data was fake. You'd wave your hands here and there and shout "that is impossible". I see no point in going there... >> >>> >>> >>>>>>For any but the most extremely scalable architectures, there is significant >>>>>>diminishing returns when adding processors for chess playing. I'd say that a >>>>>>very scalable 8-way SMP or NUMA (Opteron) machine will not be very much slower >>>>>>than even a 64-way Alpha/Itanium/xxx machine for chess. >>>>> >>>>>If badly programmed , then yes not much difference between a 8 proc box and a 64 >>>>>proc box (actually it can be lower performing!). >>>>>Which is exactly my point , you need to design a program specifically to run on >>>>>such a system - not expect something that works on a 2 or 4 proc system and >>>>>expect it to work for a 64 proc system ! >>>> >>>>The Alpha-Beta algorithm used for chess is a serial algorithm. There's no >>>>getting around that. The more processors you use, the less efficiency you will >>>>get, unless you use something else than Alpha-Beta. >>>> >>>>No matter how much you want to rewrite and "tweak" for a NUMA machine (or any >>>>kind of machine, for that matter), adding more and more processors is simply >>>>going to stop being beneficial at some point. >>> >>> >>>Just because alpha-beta is serial does not imply that it need not scale well >>>beyond the 4 or 8 or 16 proc boxes that it is shown to scale well to. >>>I _have_ seen results of how well it scales :) >>>sadly I'm not at liberty to reveal them - but in a few months/next year or so , >>>you will also see how well it scales when results are published. >>>I'm not denying the limitations of alphabeta algo - definitely they exist - but >>>not to the extent to which it is believed to exist.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.