Author: Tony Werten
Date: 21:36:27 08/31/04
Go up one level in this thread
On August 31, 2004 at 10:52:58, Robert Hyatt wrote: >On August 31, 2004 at 02:39:29, Tony Werten wrote: > >>On August 30, 2004 at 17:43:23, Robert Hyatt wrote: >> >>>On August 30, 2004 at 17:14:32, Gerd Isenberg wrote: >>> >>>>On August 30, 2004 at 16:57:50, Robert Hyatt wrote: >>>> >>>>>On August 30, 2004 at 16:39:22, Mark Young wrote: >>>>> >>>>>>On August 30, 2004 at 15:33:19, Robert Hyatt wrote: >>>>>> >>>>>>>On August 30, 2004 at 14:51:01, Uri Blass wrote: >>>>>>> >>>>>>>>On August 30, 2004 at 13:51:48, Robert Hyatt wrote: >>>>>>>> >>>>>>>>>On August 30, 2004 at 12:24:54, Volker Böhm wrote: >>>>>>>>> >>>>>>>>>>On August 30, 2004 at 10:02:54, Robert Hyatt wrote: >>>>>>>>>> >>>>>>>>>>>On August 30, 2004 at 08:30:34, Kurt Utzinger wrote: >>>>>>>>>>> >>>>>>>>>>>>On August 30, 2004 at 08:12:52, Jouni Uski wrote: >>>>>>>>>>>> >>>>>>>>>>>>>Eine FPGA-Karte untersucht momentan ca. 3 Millionen Positionen/Sekunde. 16 >>>>>>>>>>>>>Karten machen daher theoretisch 48 MPos/sec. (Donninger) >>>>>>>>>>>>> >>>>>>>>>>>>>Jouni >>>>>>>>>>>> >>>>>>>>>>>> If Hydra made 48 Mpos/sec this again proves (in comparison >>>>>>>>>>>> with the 2 Mpos/sec on Quad-Opteron server with 4 CPU's of >>>>>>>>>>>> Shredder) that the number of pos/sec can't be taken as a >>>>>>>>>>>> reliable value for the goodness of a chess program. It's >>>>>>>>>>>> of course simply impossible to compare apples and organes. >>>>>>>>>>>> Kurt [http://www.utzingerk.com] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>Don't forget that Hydra ripped Shredder's head off. So the NPS _might_ be >>>>>>>>>>>significant here... >>>>>>>>>> >>>>>>>>>>Didn´t I´ve heard you saying that 10 games are not enough to draw a >>>>>>>>>>statistically significant conclusion on the playing strength? >>>>>>>>>> >>>>>>>>>>Greetings Volker >>>>>>>>> >>>>>>>>> >>>>>>>>>With two _close_ opponents, correct. But if one is seriously stronger, as hydra >>>>>>>>>appeared to be, 10 games is plenty. >>>>>>>> >>>>>>>>We do not know if hydra is seriously stronger. >>>>>>> >>>>>>>We have a pretty good clue that it is. It is over 10x faster, potentially, than >>>>>>>other programs. >>>>>>> >>>>>>>1. I first assume that the programmer / designer is no dummy. >>>>>>> >>>>>>>2. all else being "equal" 10x faster is a _serious_ advantage. >>>>>>> >>>>>>>3. the above two points translate into a signficant strength advantage. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>You cannot start by assuming that hydra is significantly stronger when this is >>>>>>>>the question. >>>>>>> >>>>>>>With evidence, you can. IE I can certainly assume that Crafty on an 8-way >>>>>>>opteron is significantly stronger than Crafty on my dual xeon. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>If you see 10-0 you can say based on the result that Hydra is significantly >>>>>>>>stronger but when you see 5.5-2.5 you cannot claim it based on the result and >>>>>>>>you only can say that you do not know if it is significantly stronger based on >>>>>>>>the result. >>>>>>>> >>>>>>> >>>>>>> >>>>>>>If you only look at the results, maybe or maybe not. But I watched many of the >>>>>>>games with Crafty analyzing. That tells you even more. >>>>>> >>>>>>Common sense should tell us that Hydra is stronger. It should have a big >>>>>>hardware advantage. I think this is your point, and I agree. But I still need >>>>>>more data to be sure. Right now there is only a 1 in 6 chance that Hydra is the >>>>>>stronger program based on the games alone. >>>>> >>>>>Where does "one in six" come from? >>>>> >>>>>IE >>>>> >>>>>1. Hydra is at least 10x faster >>>>> >>>>>2. It won three and drew five if I recall. which is 5.5/8.0. which is right >>>>>at +200 on the Elo scale. >>>>> >>>>>Yes the number of games is low, but the 1 in 6 seems very weak. IE when you >>>>>consider _everything_. And don't forget that older versions (Brutus in >>>>>particular) played on ICC, and the current version is playing on playchess. >>>>>There are a lot of games. It is nowhere near invincible. But it is _very_ >>>>>strong compared to other programs. >>>>> >>>>>Of course I am not _sure_. But I an fairly well convinced. :) >>>> >>>>Yes, but Hydra had the advantage to prepare and tune againts a public available >>>>Shredder, while Shredder had not the chance. >>>> >>>>Anyway i am curious about Hydra's further speedups, and "fear" they will >>>>dominate the scene for some time. >>>> >>>>Current FPGAs they use, are still not the fastest. >>>>FPGAs may have more future speedgrowing as general purpose hardware for some >>>>time. More and "wider" chess "alus" for evaluation purposes. >>>>PCI express will speedup the hard- software communication. >>>>Even if it is not efficient, an additional ply in hardware is reasonable with >>>>respect to the possible PCI bottleneck. >>>> >>>>Do you have an idea about the parallel speedup of this 8*2 cluster, about six? >>> >>> >>>I really can't make an educated guess. >>> >>>A cluster will be less efficient than a pure SMP box. But to take this in >>>"pieces" we might get a number that is reasonable. >>> >>>IE 8 nodes. I'd think that if it was 4x faster with 8 nodes that would be a >>>win. Since the nodes have to talk via message-passing, sharing the hash table >>>will be a problem. 4x faster would be worth doing. >>> >>>Inside a node, there are two FPGA boards as I understand this. I don't know how >>>much of the search is done in one of these FPGA boards, but whatever it is, >>>there is an efficiency loss with no hashing going on. Of course I don't hash in >>>the q-search, and I believe Amir has said he doesn't hash in the last normal ply >>>of his search either, so perhaps this is not a killer. The main problem is that >>>this hardware apparently has to do a fixed-depth search, which limits split >>>points and accumulates significant amounts of "idle time" (ie if it is necessary >>>to only do hardware searches at ply=N, sometimes ply=N is an ALL node and the >>>parallel search will work well, othertimes it is a CUT node and parallel search >>>won't work at all. What they lose to deal with this I don't know. But I'd >>>guess that 1.5x would be a number I would be happy with. >> >>Chrilly told me they use a slightly variable depth to solve this. ie Add 1 ply >>to the hardware search on odd plies (or even ?) en stuff the software search 1 >>ply earlier to the cards. >> >>Tony >> > >That sounds like a solution with even more problems. IE one ply deeper in >hardware is one ply of "simpler" search, further exacerbated by no hash tables >in the hardware. IE the best approach for a limited hardware search is to tack >it on the end of the search using as few plies as possible since those plies are >inferior to the software search plies... Yes, maybe they lower the hardware depth and send it to the cards 1 ply later ? So 1 ply more in software. Anyway, I was told the hardware search depth was 2 or 3 ply, depending on the nodetype. > >DB had a solution to this, as it supported hash table probes in the hardware, >but Hsu never had time to design/build the multi-ported memory it required. But >he knew it was a long-term issue that needed a solution obviously... > >One other note. Even "adding a ply" won't solve the CUT/ALL problem. For >example, take Crafty and pick some remaining depth point, such as 3 plies, and >for an even or odd iteration, count the number of times that depth is a CUT or >ALL node. It will vary wildly, independently of the iteration depth, because of >search extensions, move ordering, etc... Hmm, I'm not sure. My own experience is that extensions tend to stabalize this variation. ie The odd/even effect (wich is another result of the same problem I think ) gets less when you extend more. Tony > > > > > > >>> >>>That makes your 6X faster (rather than 16X the theoretical max) a fairly decent >>>guess, although it _is_ still a guess. But 6x times 3M nodes per second is >>>still faster than any current programs on 4-way opterons. I've passed that >>>speed on 8-way boxes (and beyond) but that hardware is pretty rare to come by. >>>They are no doubt very strong. >>> >>> >>> >>> >>>>I have no idea how well this message passing clusters scale with huge number of >>>>nodes >= 32 or 256,512 or 1024. Vincent was pessimistic about that, iirc. But it >>>>seems that internode bandwidth and latency has also some potential to become >>>>faster and faster to make bigger clusters more efficient. >>>> >>>> >>>>> >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.