Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 8-way Opteron machine at last available

Author: Torstein Hall

Date: 10:11:24 08/24/04

Go up one level in this thread


On August 24, 2004 at 11:52:53, Robert Hyatt wrote:

>On August 24, 2004 at 11:00:18, Vincent Diepeveen wrote:
>
>>On August 24, 2004 at 10:42:51, Robert Hyatt wrote:
>>
>>>On August 24, 2004 at 09:06:05, Jorge Pichard wrote:
>>>
>>>>On August 24, 2004 at 06:16:26, Vincent Lejeune wrote:
>>>>
>>>>>
>>>>>A SYSTEM INTEGRATOR has started selling 5U eight way Opteron systems.
>>>>>
>>>>>http://www.theinquirer.net/?article=18035
>>>>>
>>>>>I think it's the first 8-way system since the beginning of opteron.
>>>>>
>>>>>Great news for computer chess where a lot of 4 way was used in tournaments since
>>>>>1 year !
>>>>
>>>>
>>>>It would had been a tough fight if shredder was using one against Hydra :-)
>>>>
>>>>Jorge
>>>
>>>
>>>1.  It takes even more tuning as it is still a NUMA box.  On the 4-way and 2-way
>>>boxes memory is local, 1 hop or 2 hops away.  This adds to that.
>>>
>>>2.  it won't be 2x faster as nobody scales perfectly.  IE Crafty would probably
>>
>>Scaling              = the increase in nodes a second.
>>Speedup (efficiency) = the speedup in time you get out of the box
>
>No.  Those are _your_ definitions.
>
>traditional scaling means simply "as you increase the number of processors, how
>much does that reduce the total runtime."  There are very _few_ applications
>that exhibit this NPS vs search time anomoly.  Nobody cares in the world of
>parallel programming.
>
>I care because if I can't run 4x the NPS on 4 processors, I am losing something
>that I don't necessarily have to lose.  Hence the stuff done before the WCCC to
>solve this on the opterons which started off producing pretty bad NPS increases.
>
>But the rest of the world only cares about total runtime...
>
>
>>
>>DIEP scales 100% on such 8 processor boxes.
>
>So do I.
>
>>
>>>be about 1.7X faster, more or less depending on lots of things.  That is not
>>>enough to make up for the apparent difference in playing strength between
>>>Shredder and Hydra.  IE Hydra appears to be 200+ points stronger based on a
>>>final result of 6-2.  1.7X faster won't get 200 points for Shredder...
>>
>>To my information Hydra runs currently on a 2 processor FPGA system. New fpga
>>processors, as chrilly is busy rewriting his parallel search.
>
>Web site contradicts that but since I don't have access to real data, I have no
>idea what they are running on.  But based on the results against shredded, I
>really have trouble beliveing they are using just two processors.  They
>apparently are at least 200 Elo stronger based on the match.

Is it not a bit early to draw such a conlcusion after a 8 games match. I guess
you have seen a lot longer series where the outscored program turns it around
and scores better  later on. And statistically I do not think it can be sayd 200
points with hig probability.

Torstein


>
>>
>>He has to as they were talking already times ago about a 512 processor hydra
>>version (they = university paderborn which doesn't do the actual implementation
>>of the parallel algorithm, chrilly does do that).
>>
>>The current implementation of hydra doesn't store last 3 ply in software, not to
>>mention the last 3 ply in hardware, anything in hashtables.
>>
>>The entire hashtable from each node gets broadcasted to all other nodes and
>>stored there.
>>
>>That's a O(N^2) operation trivially and doesn't scale.
>>
>>The actual speedup of hydra is not objectively measured so far. Just claiming 12
>>out of 16 without showing any actual data and already knowing that the single
>>cpu test doesn't use last 3 ply a hashtable, where any software program does do
>>that single cpu, is not a very nice comparision trivially.
>
>I haven't seen _any_ parallel search data other than my own, so all I can
>comment on is what I get...
>
>
>>
>>The 8 processor opteron cannot be compared with the cluster at which Hydra soon
>>again will run when the parallellism has been succesfully rewritten to something
>>that actually works better.
>>
>>The latency to do a single pingpong operation is 16 microseconds at the hardware
>>which is located in paderborn. Note that each node has 2 processors there and
>>the new hardware getting build in UAE is 2 machines of 8 processors connected to
>>each other.
>>
>>>These machines are not bad.  There are _several_ companies with 8-way boxes
>>
>>There is not a single company selling 8 processor opteron boxes. It is well
>>known there are some beta versions of those boxes which several companies use to
>>test upon already for some years.
>
>Since I haven't tried to buy one, I won't comment.  I _have_ run on one from two
>different vendors within the past 12 months.  And Sun was advertising one a
>while back, whether they were shipping or not I can't say.
>
>>
>>>ready to go.  I ran on one at least 6 months ago.  AMD has had one in their
>>>development lab since well before the last CCT event...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.