Author: Robert Hyatt
Date: 10:33:51 08/24/04
Go up one level in this thread
On August 24, 2004 at 13:11:24, Torstein Hall wrote: >On August 24, 2004 at 11:52:53, Robert Hyatt wrote: > >>On August 24, 2004 at 11:00:18, Vincent Diepeveen wrote: >> >>>On August 24, 2004 at 10:42:51, Robert Hyatt wrote: >>> >>>>On August 24, 2004 at 09:06:05, Jorge Pichard wrote: >>>> >>>>>On August 24, 2004 at 06:16:26, Vincent Lejeune wrote: >>>>> >>>>>> >>>>>>A SYSTEM INTEGRATOR has started selling 5U eight way Opteron systems. >>>>>> >>>>>>http://www.theinquirer.net/?article=18035 >>>>>> >>>>>>I think it's the first 8-way system since the beginning of opteron. >>>>>> >>>>>>Great news for computer chess where a lot of 4 way was used in tournaments since >>>>>>1 year ! >>>>> >>>>> >>>>>It would had been a tough fight if shredder was using one against Hydra :-) >>>>> >>>>>Jorge >>>> >>>> >>>>1. It takes even more tuning as it is still a NUMA box. On the 4-way and 2-way >>>>boxes memory is local, 1 hop or 2 hops away. This adds to that. >>>> >>>>2. it won't be 2x faster as nobody scales perfectly. IE Crafty would probably >>> >>>Scaling = the increase in nodes a second. >>>Speedup (efficiency) = the speedup in time you get out of the box >> >>No. Those are _your_ definitions. >> >>traditional scaling means simply "as you increase the number of processors, how >>much does that reduce the total runtime." There are very _few_ applications >>that exhibit this NPS vs search time anomoly. Nobody cares in the world of >>parallel programming. >> >>I care because if I can't run 4x the NPS on 4 processors, I am losing something >>that I don't necessarily have to lose. Hence the stuff done before the WCCC to >>solve this on the opterons which started off producing pretty bad NPS increases. >> >>But the rest of the world only cares about total runtime... >> >> >>> >>>DIEP scales 100% on such 8 processor boxes. >> >>So do I. >> >>> >>>>be about 1.7X faster, more or less depending on lots of things. That is not >>>>enough to make up for the apparent difference in playing strength between >>>>Shredder and Hydra. IE Hydra appears to be 200+ points stronger based on a >>>>final result of 6-2. 1.7X faster won't get 200 points for Shredder... >>> >>>To my information Hydra runs currently on a 2 processor FPGA system. New fpga >>>processors, as chrilly is busy rewriting his parallel search. >> >>Web site contradicts that but since I don't have access to real data, I have no >>idea what they are running on. But based on the results against shredded, I >>really have trouble beliveing they are using just two processors. They >>apparently are at least 200 Elo stronger based on the match. > >Is it not a bit early to draw such a conlcusion after a 8 games match. I guess >you have seen a lot longer series where the outscored program turns it around >and scores better later on. And statistically I do not think it can be sayd 200 >points with hig probability. > >Torstein While I agree somewhat, there are some circumstances that led to my conclusion: (1) long time control so no "blitz mistakes" creep in. (2) primary authors are running both so there is little chance of a poor configuration set-up to skew the results. (3) the games themselves make it look almost easy at times. And when a program wins "easily" it is news. My impression is that Hydra is simply out-searching Shredder badly. I don't have quite the same feeling about Hydra's evaluation as I have seen a few strange moves that other programs don't even consider. But then I saw the _same_ thing back in the Deep Thought days, and just maybe those "strange moves" are really best with a very deep/fast search. > > >> >>> >>>He has to as they were talking already times ago about a 512 processor hydra >>>version (they = university paderborn which doesn't do the actual implementation >>>of the parallel algorithm, chrilly does do that). >>> >>>The current implementation of hydra doesn't store last 3 ply in software, not to >>>mention the last 3 ply in hardware, anything in hashtables. >>> >>>The entire hashtable from each node gets broadcasted to all other nodes and >>>stored there. >>> >>>That's a O(N^2) operation trivially and doesn't scale. >>> >>>The actual speedup of hydra is not objectively measured so far. Just claiming 12 >>>out of 16 without showing any actual data and already knowing that the single >>>cpu test doesn't use last 3 ply a hashtable, where any software program does do >>>that single cpu, is not a very nice comparision trivially. >> >>I haven't seen _any_ parallel search data other than my own, so all I can >>comment on is what I get... >> >> >>> >>>The 8 processor opteron cannot be compared with the cluster at which Hydra soon >>>again will run when the parallellism has been succesfully rewritten to something >>>that actually works better. >>> >>>The latency to do a single pingpong operation is 16 microseconds at the hardware >>>which is located in paderborn. Note that each node has 2 processors there and >>>the new hardware getting build in UAE is 2 machines of 8 processors connected to >>>each other. >>> >>>>These machines are not bad. There are _several_ companies with 8-way boxes >>> >>>There is not a single company selling 8 processor opteron boxes. It is well >>>known there are some beta versions of those boxes which several companies use to >>>test upon already for some years. >> >>Since I haven't tried to buy one, I won't comment. I _have_ run on one from two >>different vendors within the past 12 months. And Sun was advertising one a >>while back, whether they were shipping or not I can't say. >> >>> >>>>ready to go. I ran on one at least 6 months ago. AMD has had one in their >>>>development lab since well before the last CCT event...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.