Author: Vincent Diepeveen
Date: 09:17:45 09/03/03
Go up one level in this thread
On September 03, 2003 at 12:04:01, Robert Hyatt wrote: >On September 03, 2003 at 05:13:07, Mridul Muralidharan wrote: > >>Hi, >> >> >>On September 02, 2003 at 18:37:05, Jeremiah Penery wrote: >> >>>On September 02, 2003 at 07:15:55, Mridul Muralidharan wrote: >>> >><snip> >>> >>>If I didn't have some idea what I was talking about, I wouldn't be talking, >>>unlike a lot of people in these discussions. >>> >> >>This should give you some direction to think about : >>http://www.talkchess.com/forums/1/message.html?313791 >> >>Actually , more references could be give - but considering your set mindset, no >>thanks :) >> >>>> Refer to cray architecture , an opteron 8 way box architecture , and some >>>>IBM supercomp cc-NUMA based system architecture docs for more info. I'm not >>> >>>Those machines are designed and built for *completely* different purposes. You >>>might as well compare the documentation for a P4 to that of an UltraSPARC, for >>>all the good it would do you. >>> >> >>If you say a cc-NUMA is built for a entirely different purpose - definitely , I >>agree with you ! Like Bob Hyatt mentions in the above mentioned post - >>performace / price / scalability matrix works quiet well for NUMA at higher >>number of processors. >>Which is exactly what I said - no point in saying crafty (or any other program >>for that matter) will scale well on a 16 or 64 proc NUMA box just 'cos it scales >>well on a 4 proc smp box. >>NUMA machines are a slightly different breed. >> >>>>refering to just theoretical differences , or _only_ architecture differences - >>>>but as a programmer - what details that need to be taken care of while writing >>>>apps for such a system. >>> >>>And those details would be what, other than the aforementioned theoretical or >>>architectural differences? >>> >> >>Quiet simple - on a smp or cray box , typically you do not care much about >>latency for accessing memory being different for different processors , etc. As >>a programmer , you have to be aware of all these. >>Why do you think Linux on numa sucks ass ?!!! >>Also , depending on how the box is configured , number of procs per node , etc - >>you memory management , thread/process splitting , etc (for a chess program that >>is) will have to be modified. >>Just because you might know what the architecture of the box is , does not imply >>that you will come up with a program which scales well on NUMA ! >> >> >>>>>But in reality, almost nobody uses a machine that big, especially for chess. >>>> >>>>The question was - can it be done , is it just a bunch of tweaks - not do you >>>>have a system. >>>>Answer : Yes it cn be done , needs lots of rewrite - not just "tweaks". >>> >>>Not really. Bob said he already completed the changes, and it didn't really >>>involve much. Only instead of forking processes he had to manually start >>>processes on each processor. That really doesn't take much work. >>> >> >> >>If it was just a bunch of tweaks that you mention here - I would love to see how >>much performance it will give on a 64/128 proc NUMA box :) >>I can make a guess - it will suck a**. (No offence to anyone here) > >You are mixing apples and oranges. How will it do on a 128 node SMP >box? How will it do on a 128 node NUMA box? _both_ will not do very well >since things are not tuned for that many processors. However, the original >NUMA port did pretty well on a 32 CPU box. Not as well as it would have done >on a 32 CPU SMP box however. But then NUMA won't _ever_ produce the same >level of performance as pure SMP boxes will. They are just much more >affordable. Bob, show that 32 cpu output and crafty version number with it. Thanks, Vincent > >> >> >>>>>For any but the most extremely scalable architectures, there is significant >>>>>diminishing returns when adding processors for chess playing. I'd say that a >>>>>very scalable 8-way SMP or NUMA (Opteron) machine will not be very much slower >>>>>than even a 64-way Alpha/Itanium/xxx machine for chess. >>>> >>>>If badly programmed , then yes not much difference between a 8 proc box and a 64 >>>>proc box (actually it can be lower performing!). >>>>Which is exactly my point , you need to design a program specifically to run on >>>>such a system - not expect something that works on a 2 or 4 proc system and >>>>expect it to work for a 64 proc system ! >>> >>>The Alpha-Beta algorithm used for chess is a serial algorithm. There's no >>>getting around that. The more processors you use, the less efficiency you will >>>get, unless you use something else than Alpha-Beta. >>> >>>No matter how much you want to rewrite and "tweak" for a NUMA machine (or any >>>kind of machine, for that matter), adding more and more processors is simply >>>going to stop being beneficial at some point. >> >> >>Just because alpha-beta is serial does not imply that it need not scale well >>beyond the 4 or 8 or 16 proc boxes that it is shown to scale well to. >>I _have_ seen results of how well it scales :) >>sadly I'm not at liberty to reveal them - but in a few months/next year or so , >>you will also see how well it scales when results are published. >>I'm not denying the limitations of alphabeta algo - definitely they exist - but >>not to the extent to which it is believed to exist.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.