Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 8-way Opteron machine at last available

Author: Robert Hyatt

Date: 10:33:51 08/24/04

Go up one level in this thread


On August 24, 2004 at 13:11:24, Torstein Hall wrote:

>On August 24, 2004 at 11:52:53, Robert Hyatt wrote:
>
>>On August 24, 2004 at 11:00:18, Vincent Diepeveen wrote:
>>
>>>On August 24, 2004 at 10:42:51, Robert Hyatt wrote:
>>>
>>>>On August 24, 2004 at 09:06:05, Jorge Pichard wrote:
>>>>
>>>>>On August 24, 2004 at 06:16:26, Vincent Lejeune wrote:
>>>>>
>>>>>>
>>>>>>A SYSTEM INTEGRATOR has started selling 5U eight way Opteron systems.
>>>>>>
>>>>>>http://www.theinquirer.net/?article=18035
>>>>>>
>>>>>>I think it's the first 8-way system since the beginning of opteron.
>>>>>>
>>>>>>Great news for computer chess where a lot of 4 way was used in tournaments since
>>>>>>1 year !
>>>>>
>>>>>
>>>>>It would had been a tough fight if shredder was using one against Hydra :-)
>>>>>
>>>>>Jorge
>>>>
>>>>
>>>>1.  It takes even more tuning as it is still a NUMA box.  On the 4-way and 2-way
>>>>boxes memory is local, 1 hop or 2 hops away.  This adds to that.
>>>>
>>>>2.  it won't be 2x faster as nobody scales perfectly.  IE Crafty would probably
>>>
>>>Scaling              = the increase in nodes a second.
>>>Speedup (efficiency) = the speedup in time you get out of the box
>>
>>No.  Those are _your_ definitions.
>>
>>traditional scaling means simply "as you increase the number of processors, how
>>much does that reduce the total runtime."  There are very _few_ applications
>>that exhibit this NPS vs search time anomoly.  Nobody cares in the world of
>>parallel programming.
>>
>>I care because if I can't run 4x the NPS on 4 processors, I am losing something
>>that I don't necessarily have to lose.  Hence the stuff done before the WCCC to
>>solve this on the opterons which started off producing pretty bad NPS increases.
>>
>>But the rest of the world only cares about total runtime...
>>
>>
>>>
>>>DIEP scales 100% on such 8 processor boxes.
>>
>>So do I.
>>
>>>
>>>>be about 1.7X faster, more or less depending on lots of things.  That is not
>>>>enough to make up for the apparent difference in playing strength between
>>>>Shredder and Hydra.  IE Hydra appears to be 200+ points stronger based on a
>>>>final result of 6-2.  1.7X faster won't get 200 points for Shredder...
>>>
>>>To my information Hydra runs currently on a 2 processor FPGA system. New fpga
>>>processors, as chrilly is busy rewriting his parallel search.
>>
>>Web site contradicts that but since I don't have access to real data, I have no
>>idea what they are running on.  But based on the results against shredded, I
>>really have trouble beliveing they are using just two processors.  They
>>apparently are at least 200 Elo stronger based on the match.
>
>Is it not a bit early to draw such a conlcusion after a 8 games match. I guess
>you have seen a lot longer series where the outscored program turns it around
>and scores better  later on. And statistically I do not think it can be sayd 200
>points with hig probability.
>
>Torstein

While I agree somewhat, there are some circumstances that led to my conclusion:

(1) long time control so no "blitz mistakes" creep in.

(2) primary authors are running both so there is little chance of a poor
configuration set-up to skew the results.

(3) the games themselves make it look almost easy at times.  And when a program
wins "easily" it is news.

My impression is that Hydra is simply out-searching Shredder badly.  I don't
have quite the same feeling about Hydra's evaluation as I have seen a few
strange moves that other programs don't even consider.  But then I saw the
_same_ thing back in the Deep Thought days, and just maybe those "strange moves"
are really best with a very deep/fast search.


>
>
>>
>>>
>>>He has to as they were talking already times ago about a 512 processor hydra
>>>version (they = university paderborn which doesn't do the actual implementation
>>>of the parallel algorithm, chrilly does do that).
>>>
>>>The current implementation of hydra doesn't store last 3 ply in software, not to
>>>mention the last 3 ply in hardware, anything in hashtables.
>>>
>>>The entire hashtable from each node gets broadcasted to all other nodes and
>>>stored there.
>>>
>>>That's a O(N^2) operation trivially and doesn't scale.
>>>
>>>The actual speedup of hydra is not objectively measured so far. Just claiming 12
>>>out of 16 without showing any actual data and already knowing that the single
>>>cpu test doesn't use last 3 ply a hashtable, where any software program does do
>>>that single cpu, is not a very nice comparision trivially.
>>
>>I haven't seen _any_ parallel search data other than my own, so all I can
>>comment on is what I get...
>>
>>
>>>
>>>The 8 processor opteron cannot be compared with the cluster at which Hydra soon
>>>again will run when the parallellism has been succesfully rewritten to something
>>>that actually works better.
>>>
>>>The latency to do a single pingpong operation is 16 microseconds at the hardware
>>>which is located in paderborn. Note that each node has 2 processors there and
>>>the new hardware getting build in UAE is 2 machines of 8 processors connected to
>>>each other.
>>>
>>>>These machines are not bad.  There are _several_ companies with 8-way boxes
>>>
>>>There is not a single company selling 8 processor opteron boxes. It is well
>>>known there are some beta versions of those boxes which several companies use to
>>>test upon already for some years.
>>
>>Since I haven't tried to buy one, I won't comment.  I _have_ run on one from two
>>different vendors within the past 12 months.  And Sun was advertising one a
>>while back, whether they were shipping or not I can't say.
>>
>>>
>>>>ready to go.  I ran on one at least 6 months ago.  AMD has had one in their
>>>>development lab since well before the last CCT event...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.