Author: Jorge
Date: 08:55:20 01/23/04
Go up one level in this thread
On January 23, 2004 at 10:45:02, Rolf Tueschen wrote: >On January 23, 2004 at 10:13:32, Richard Pijl wrote: > >>On January 22, 2004 at 21:02:19, Rolf Tueschen wrote: >> >>>On January 22, 2004 at 20:15:14, Rolf Tueschen wrote: >>> >>>>On January 22, 2004 at 12:53:16, Christophe Theron wrote: >>>> >>>>>On January 21, 2004 at 20:00:12, Kolss wrote: >>>>> >>>>>>Hi, >>>>>> >>>>>>How many games you need depends on what you want to show, of course... :-) >>>>>>If my calculations are correct, I get the following: >>>>>> >>>>>>Shredder 8 vs. Shredder 7.04: >>>>>> >>>>>>+90 -65 =145 >>>>>> >>>>>>=> 162.5 - 137.5 >>>>>> >>>>>>=> 54.17 % >>>>>> >>>>>>=> >>>>>>Elo difference = +29 >>>>>>95 % confidence interval: [+1, +58] >>>>>> >>>>>>That means that based on this 300-game match (for this particular time control >>>>>>on this particular computer with these particular settings etc.), your best >>>>>>guess is that S8 is 29 Elo points better than S7.04 (highest likelihood for that >>>>>>value); there is a 95 % chance that S8 is between 1 and 58 Elo points better; >>>>>>and the likelihood that S8 is (at least 1 Elo point) better than S7.04 is 97.5 >>>>>>%. >>>> >>>> >>>>This is wrong. Stats doesn't work this way. In your example above 1 Elo is as >>>>probable as 58 Elo. There is no way to hypostate that Elo 29 is the "best" >>>>guess. With a defined confidence int. of 95% you get a variance of 1 to 58 Elo >>>>points. Then you look how your results are differing for two progs. All results >>>>between 1 and 58 tell you nothing about differences! You still have to admit >>>>that the two progs could be equally strong. You need at least Elo +-59 >>> >>>[correction: you need simply 59 for the difference between progs] for a >>>>claim of being better. >> >>What is estimated above using statistical methods is the difference in ELO >>between Shredder 8 and Shredder 7.04. The difference is estimated to be +29, >>where the confidence interval (95%) of the difference is +1 - +58. This means >>that with the probability of 97.5% Shredder 8 is stronger by at least 1 ELO >>point. >>What do you not understand here? > >Richard, >you got it the wrong way around. Look if you speak of INNER confidence marge >there is nothing you can say while if you get a higher number for a difference >THEN you have a significant difference in strength. Excuse me for being firm in >what must be said. It's just stats. Nothing >what I have calculated or invented. Nothing personal between the two of us I >hope. > >Rolf Rolf, I agree that there is something wrong here with this method. I would go about it assuming that they are equal strength and make a hypothesis that Elo of S8 is > S7 and take it from there. Forgive me for being vague on this subject, but Confidence intervals underline assumptions, like Normal distributions. Were we talking about ave. ELO differences between the progs? And how did we come up with Upper/Lower bounds of 58 and 1? Just my 2 cents worth on this topic. Cheers, jorge > > > >> >>Richard. >> >> >>> >>>>- NB you propose that the two progs are equally >>>>strong and then you test against it. You must top 58. [all this on the base of a >>>>specific N of games, the results calculated in Elo; I didn't follow the debate >>>>but normally you calculate with scores from the games/matches just for >>>>mentioning it] >>>> >>>>Rolf >>>> >>>> >>>>>> >>>>>>So if you "only" want to show that S8 is better, you can - statistically >>>>>>speaking - stop now. If you want to "prove" that it is more than 20 Elo points >>>>>>better, you need a few more games indeed... >>>>>> >>>>>>Best regards - Munjong. >>>>> >>>>> >>>>> >>>>>It's great to see that at least one guy is able to correctly interpret match >>>>>results here. >>>>> >>>>>I hope you will post more often on this subject. Information on it is very much >>>>>needed here. >>>>> >>>>> >>>>> >>>>> Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.