Author: Vasik Rajlich
Date: 03:43:40 08/31/04
Go up one level in this thread
On August 30, 2004 at 19:43:15, Mark Young wrote: >On August 30, 2004 at 19:22:16, James T. Walker wrote: > >>On August 30, 2004 at 16:39:22, Mark Young wrote: >> >>>On August 30, 2004 at 15:33:19, Robert Hyatt wrote: >>> >>>>On August 30, 2004 at 14:51:01, Uri Blass wrote: >>>> >>>>>On August 30, 2004 at 13:51:48, Robert Hyatt wrote: >>>>> >>>>>>On August 30, 2004 at 12:24:54, Volker Böhm wrote: >>>>>> >>>>>>>On August 30, 2004 at 10:02:54, Robert Hyatt wrote: >>>>>>> >>>>>>>>On August 30, 2004 at 08:30:34, Kurt Utzinger wrote: >>>>>>>> >>>>>>>>>On August 30, 2004 at 08:12:52, Jouni Uski wrote: >>>>>>>>> >>>>>>>>>>Eine FPGA-Karte untersucht momentan ca. 3 Millionen Positionen/Sekunde. 16 >>>>>>>>>>Karten machen daher theoretisch 48 MPos/sec. (Donninger) >>>>>>>>>> >>>>>>>>>>Jouni >>>>>>>>> >>>>>>>>> If Hydra made 48 Mpos/sec this again proves (in comparison >>>>>>>>> with the 2 Mpos/sec on Quad-Opteron server with 4 CPU's of >>>>>>>>> Shredder) that the number of pos/sec can't be taken as a >>>>>>>>> reliable value for the goodness of a chess program. It's >>>>>>>>> of course simply impossible to compare apples and organes. >>>>>>>>> Kurt [http://www.utzingerk.com] >>>>>>>> >>>>>>>> >>>>>>>>Don't forget that Hydra ripped Shredder's head off. So the NPS _might_ be >>>>>>>>significant here... >>>>>>> >>>>>>>Didn´t I´ve heard you saying that 10 games are not enough to draw a >>>>>>>statistically significant conclusion on the playing strength? >>>>>>> >>>>>>>Greetings Volker >>>>>> >>>>>> >>>>>>With two _close_ opponents, correct. But if one is seriously stronger, as hydra >>>>>>appeared to be, 10 games is plenty. >>>>> >>>>>We do not know if hydra is seriously stronger. >>>> >>>>We have a pretty good clue that it is. It is over 10x faster, potentially, than >>>>other programs. >>>> >>>>1. I first assume that the programmer / designer is no dummy. >>>> >>>>2. all else being "equal" 10x faster is a _serious_ advantage. >>>> >>>>3. the above two points translate into a signficant strength advantage. >>>> >>>> >>>>> >>>>>You cannot start by assuming that hydra is significantly stronger when this is >>>>>the question. >>>> >>>>With evidence, you can. IE I can certainly assume that Crafty on an 8-way >>>>opteron is significantly stronger than Crafty on my dual xeon. >>>> >>>> >>>>> >>>>>If you see 10-0 you can say based on the result that Hydra is significantly >>>>>stronger but when you see 5.5-2.5 you cannot claim it based on the result and >>>>>you only can say that you do not know if it is significantly stronger based on >>>>>the result. >>>>> >>>> >>>> >>>>If you only look at the results, maybe or maybe not. But I watched many of the >>>>games with Crafty analyzing. That tells you even more. >>> >>>Common sense should tell us that Hydra is stronger. It should have a big >>>hardware advantage. I think this is your point, and I agree. But I still need >>>more data to be sure. Right now there is only a 1 in 6 chance that Hydra is the >>>stronger program based on the games alone. >> >>*************** >>Where do you get a 1 in 6 chance that Hydra is stronger??? If that's true then >>what are the odds that Shredder is stronger?? Maybe 5 in 6 ??? >>**************** >> > >You must understand.... The odds are so low because not many games were played. >Now if Hydra would have won 8-0 or 7-1 then you have a much higher certainty. A >8-0 score shows (100>) to 1 against sherdder being stronger. A score of 7-1 >would have a 20 to 1 against Shredder being stronger. Now when you have a match >score of 5.5 - 2.5. It tells you almost nothing. It is close to being 50 - 50 >odds or to be exact 65 to 45 that Hydra is stronger. > These "mathematical" tables are so abused that it's probably better that they didn't exist. For example: A 5.5-2.5 self-play result, with two similar versions no more than five rating points apart, does not allow you to draw a statistically significant conclusion. OTOH, if you know that two programs differ in strength by 200 points, then a 5.5-2.5 will allow you to identify the stronger side with near-100% certainty. If you insist on having a table with only a match score as input and a single-value statistical significance as output, you can expect nonsense. Vas > >>> >>>> >>>> >>>> >>>>>Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.