Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Conclusion

Author: Robert Hyatt

Date: 10:59:12 12/27/03

Go up one level in this thread


On December 27, 2003 at 04:58:51, Mridul Muralidharan wrote:

>On December 26, 2003 at 19:29:57, Robert Hyatt wrote:
>
>>On December 26, 2003 at 18:40:57, Mridul Muralidharan wrote:
>>
>>>On December 26, 2003 at 16:13:26, Luis Smith wrote:
>>>
>>>>On December 26, 2003 at 15:34:43, Darren Rushton wrote:
>>>>
>>>>>>Actually what happens, is the 366 is SLOW.  And I mean SLOW.
>>>>>
>>>>>I don't intend to be controversial here, but the conclusion I draw from your
>>>>>results is that Shredder 7 is such a brilliant program it is almost a match for
>>>>>the one of the better amateur programs on hardware that's almost 10 times
>>>>>slower.
>>>>>
>>>>>Regards,
>>>>>
>>>>>Darren
>>>>
>>>>I think you're missing the point of these experiments.  Some people here were
>>>>saying that Crafty isn't a world contender.  Bob could get much better hardware
>>>>than most of the commercials.  He mentioned something about a 32 way box.  Can
>>>>you imagine the speed of Crafty on something like that?
>>>>
>>>>I don't think anyone can count Crafty out after this experiment.
>>>
>>>You also need to get decent speedup on those boxes :)
>>>And I hope it is not a shared bus 32 proc bus ;)
>>>
>>>A 4/8 cpu opteron against a 32 proc alpha is not fair - for crafty - it would
>>>lose again against say shredder or fritz.
>>>
>>>Mridul
>>
>>
>>I'm not sure what you are saying in the above?  The 4-8cpu opteron will not
>>run programs well without some work.  I've already done it.  Just dropping
>>in deep fritz or something similar will not produce great results, from past
>>experience.  As far as 32 proc alpha, it depends on the box.  I got reasonable
>>scaling on the 32 cpu version I used last year at Compaq.
>
>4 things are important here.
>
>1) If deep fritz/shredder/etc gets released which supports quad/8 way opteron -
>then it will be ported and tested. And the authors will ensure that there is a
>decent speedup.
>Dont tell me that they are never going to figure out how to get their program
>working on a numa opteron box :) - Nalimov could have a good job at crafty - but
>even other people would figure out what to do from their specs and docs.
>And I have a suspicion that some already have ;)

I didn't suggest that at all.  But the question seemed to be based on _today_
not _next year_.  Today's SMP programs need some changes for the Opteron or
they will run into some interesting cache and memory reference problems.  None
are hard to fix.  But they _do_ have to be fixed for reasonable results...




>
>2) The alpha proc has a disadvantage in latency and processing power w.r.t the
>opteron - so it is never a 1:4 or 1:8 h/w advantage between the two machines -
>much lower.


It depends on the program and the programmer most likely.  I have not run
on recent alphas, but I ran on a 21264 at 666mhz a while back and was getting
around 1M nps.  The last time I ran on a 16-way alpha, the NPS scaled at
something around 14X+, I will have to see if I can dig up the old logs.  A
group of doctors bought such a machine here about 2-3 years ago and I ran
on it on ICC for a dozen games or so one afternoon.  This was a 21164
machine with 16 cpus, and a single CPU was doing about 500K, the 16-way
box was doing about 7M (SJLIM watched a few games so he might remember
the actual numbers, but I don't have any logs myself).  That was OK scaling.



>Also - what is crafty speedup here ? Any numbers ? What kind of machine is it ?
>Shared bus ? - in which case you are dead due to bus contention.
>numa ? - I thought you said crafty works only on windows and intel/amd. You have
>crafty working for alpha also ?


If you have been following the discussions here, you might recall that I
was working with Compaq last year on an alpha-based NUMA version of Crafty.
That is why it was so quick to get a NUMA version of Crafty ready again this
year when the Opteron idea came up.  I had already done it once, although
the alpha I had here lost the disk drive and all the source changes.
Fortunately the changes were not that drastic, although I never completed
_all_ the things that needed doing.  On the 32-way box I was seeing about
13X faster _searches_. (I am not talking about NPS here but pure time to
solution).  It was lower than the number I wanted to see, but it needed some
program changes to further improve it.  IE on a Cray T932, the NPS scales
by almost exactly 32X.  The speedup was closer to 18x than the predicted 22x
using my given formula.  However, Crafty doesn't do vectors like Cray Blitz
did so there was more to be had from that machine that I didn't try to get...





>
>3) A program scaling at 4 or 8 proc is going to be much higher than at 16 or 32.


Again, that is not a statement you can make without a qualifier.  IE CB
scaled perfectly at 32.  Whatever it got at 16, it got 2x that at 32.  My
point is that your statement may or may not be true.  It depends on the
architecture, and what memory looks like.  IE clearly for NUMA boxes, the
scaling is going to be worse than for a machine based on a pure cross-bar
like the Crays.  How much worse is a subject for great debate.  IE I have
some code in Crafty that was designed for a machine that had multiple
processors per "node".  Hence my idea about "processor groups" in Crafty.
It has not been fully tweaked and tuned, but the code has been there for
several years to better fit machines where some processors work together
"better" than others because of the "node" concept.

So a lot depends on the architecture.  A lot more depends on the program
and the programmer's understanding of the architecture and what makes it look
good or bad.

Theoretically, _nothing_ prevents a program from scaling to 1024 processors.
Practically, it is a real challenge.  But, unlike our resident NUMA expert,
I'm not about to write NUMA and clusters and NUMA clusters off.  I think the
problems are significant, but hardly unsolvable...




>
>4) Even with a 1:10 or 1:8 advantage crafty only barely manages to catch up or
>beat these top order programs - so not much of a chance if they show up with the
>above mentioned machines.

That was what was not so clear in your post.  If you assume opteron-optimized
fritz, vs Crafty on a significantly bigger box, maybe.  But the concept we are
talking about was crafty at the 2003 WCCC event, not crafty in 2 years.  That
means a big opteron, itanium or alpha machine vs a 4-way or 8-way xeon.  ANd
from experience, the 8-way xeons are _not_ very good.  They use the same
memory system as the 4-way boxes, which means 2x the cpus, 1x the memory band-
width.  Not a good mix for programs that really have a high memory bandwidth
requirement like chess engines with their big hash tables which runs afoul of
PIV's with their long L2 cache lines, and the corresponding cache conflicts
that arise.

So, again, the question was "did I not go because Crafty had no chance?"  The
answer is clearly "no".  Crafty beat Rebel pretty handily at 8:1.  It scraped
by Junior.  It may or may not beat Shredder.  But, it _is_ competitive.  It
had chances.  That's the point here.

Given another year of NUMA activity, it will be _more_ competitive.

>
>Mridul



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.