Author: Rolf Tueschen
Date: 09:13:39 01/23/04
Go up one level in this thread
On January 23, 2004 at 11:55:20, Jorge wrote: >On January 23, 2004 at 10:45:02, Rolf Tueschen wrote: > >>On January 23, 2004 at 10:13:32, Richard Pijl wrote: >> >>>On January 22, 2004 at 21:02:19, Rolf Tueschen wrote: >>> >>>>On January 22, 2004 at 20:15:14, Rolf Tueschen wrote: >>>> >>>>>On January 22, 2004 at 12:53:16, Christophe Theron wrote: >>>>> >>>>>>On January 21, 2004 at 20:00:12, Kolss wrote: >>>>>> >>>>>>>Hi, >>>>>>> >>>>>>>How many games you need depends on what you want to show, of course... :-) >>>>>>>If my calculations are correct, I get the following: >>>>>>> >>>>>>>Shredder 8 vs. Shredder 7.04: >>>>>>> >>>>>>>+90 -65 =145 >>>>>>> >>>>>>>=> 162.5 - 137.5 >>>>>>> >>>>>>>=> 54.17 % >>>>>>> >>>>>>>=> >>>>>>>Elo difference = +29 >>>>>>>95 % confidence interval: [+1, +58] >>>>>>> >>>>>>>That means that based on this 300-game match (for this particular time control >>>>>>>on this particular computer with these particular settings etc.), your best >>>>>>>guess is that S8 is 29 Elo points better than S7.04 (highest likelihood for that >>>>>>>value); there is a 95 % chance that S8 is between 1 and 58 Elo points better; >>>>>>>and the likelihood that S8 is (at least 1 Elo point) better than S7.04 is 97.5 >>>>>>>%. >>>>> >>>>> >>>>>This is wrong. Stats doesn't work this way. In your example above 1 Elo is as >>>>>probable as 58 Elo. There is no way to hypostate that Elo 29 is the "best" >>>>>guess. With a defined confidence int. of 95% you get a variance of 1 to 58 Elo >>>>>points. Then you look how your results are differing for two progs. All results >>>>>between 1 and 58 tell you nothing about differences! You still have to admit >>>>>that the two progs could be equally strong. You need at least Elo +-59 >>>> >>>>[correction: you need simply 59 for the difference between progs] for a >>>>>claim of being better. >>> >>>What is estimated above using statistical methods is the difference in ELO >>>between Shredder 8 and Shredder 7.04. The difference is estimated to be +29, >>>where the confidence interval (95%) of the difference is +1 - +58. This means >>>that with the probability of 97.5% Shredder 8 is stronger by at least 1 ELO >>>point. >>>What do you not understand here? >> >>Richard, >>you got it the wrong way around. Look if you speak of INNER confidence marge >>there is nothing you can say while if you get a higher number for a difference >>THEN you have a significant difference in strength. Excuse me for being firm in >>what must be said. It's just stats. Nothing >>what I have calculated or invented. Nothing personal between the two of us I >>hope. >> >>Rolf > >Rolf, > >I agree that there is something wrong here with this method. I would go about it >assuming that they are equal strength and make a hypothesis that Elo of S8 is > >S7 and take it from there. Forgive me for being vague on this subject, but >Confidence intervals underline assumptions, like Normal distributions. Were we >talking about ave. ELO differences between the progs? And how did we come up >with Upper/Lower bounds of 58 and 1? >Just my 2 cents worth on this topic. You must be joking, you got it better than many experts here around. Exactly for the same reasons you gave, I asked my simple question what the Bell curve should be good for. What I got was "you are dumb - it's the Bell curve! Just look into a good book about stats." This is what Hyatt called hand-waving, others call it hot air. But seriously. Only if you are very experienced you can directly discover when something is shaky or totally wrong. I saw that and you too. Rolf > >Cheers, >jorge >> >> >> >>> >>>Richard. >>> >>> >>>> >>>>>- NB you propose that the two progs are equally >>>>>strong and then you test against it. You must top 58. [all this on the base of a >>>>>specific N of games, the results calculated in Elo; I didn't follow the debate >>>>>but normally you calculate with scores from the games/matches just for >>>>>mentioning it] >>>>> >>>>>Rolf >>>>> >>>>> >>>>>>> >>>>>>>So if you "only" want to show that S8 is better, you can - statistically >>>>>>>speaking - stop now. If you want to "prove" that it is more than 20 Elo points >>>>>>>better, you need a few more games indeed... >>>>>>> >>>>>>>Best regards - Munjong. >>>>>> >>>>>> >>>>>> >>>>>>It's great to see that at least one guy is able to correctly interpret match >>>>>>results here. >>>>>> >>>>>>I hope you will post more often on this subject. Information on it is very much >>>>>>needed here. >>>>>> >>>>>> >>>>>> >>>>>> Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.