Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Some stats...

Author: Rolf Tueschen

Date: 09:13:39 01/23/04

Go up one level in this thread


On January 23, 2004 at 11:55:20, Jorge wrote:

>On January 23, 2004 at 10:45:02, Rolf Tueschen wrote:
>
>>On January 23, 2004 at 10:13:32, Richard Pijl wrote:
>>
>>>On January 22, 2004 at 21:02:19, Rolf Tueschen wrote:
>>>
>>>>On January 22, 2004 at 20:15:14, Rolf Tueschen wrote:
>>>>
>>>>>On January 22, 2004 at 12:53:16, Christophe Theron wrote:
>>>>>
>>>>>>On January 21, 2004 at 20:00:12, Kolss wrote:
>>>>>>
>>>>>>>Hi,
>>>>>>>
>>>>>>>How many games you need depends on what you want to show, of course... :-)
>>>>>>>If my calculations are correct, I get the following:
>>>>>>>
>>>>>>>Shredder 8 vs. Shredder 7.04:
>>>>>>>
>>>>>>>+90 -65 =145
>>>>>>>
>>>>>>>=> 162.5 - 137.5
>>>>>>>
>>>>>>>=> 54.17 %
>>>>>>>
>>>>>>>=>
>>>>>>>Elo difference = +29
>>>>>>>95 % confidence interval: [+1, +58]
>>>>>>>
>>>>>>>That means that based on this 300-game match (for this particular time control
>>>>>>>on this particular computer with these particular settings etc.), your best
>>>>>>>guess is that S8 is 29 Elo points better than S7.04 (highest likelihood for that
>>>>>>>value); there is a 95 % chance that S8 is between 1 and 58 Elo points better;
>>>>>>>and the likelihood that S8 is (at least 1 Elo point) better than S7.04 is 97.5
>>>>>>>%.
>>>>>
>>>>>
>>>>>This is wrong. Stats doesn't work this way. In your example above 1 Elo is as
>>>>>probable as 58 Elo. There is no way to hypostate that Elo 29 is the "best"
>>>>>guess. With a defined confidence int. of 95% you get a variance of 1 to 58 Elo
>>>>>points. Then you look how your results are differing for two progs. All results
>>>>>between 1 and 58 tell you nothing about differences! You still have to admit
>>>>>that the two progs could be equally strong. You need at least Elo +-59
>>>>
>>>>[correction: you need simply 59 for the difference between progs] for a
>>>>>claim of being better.
>>>
>>>What is estimated above using statistical methods is the difference in ELO
>>>between Shredder 8 and Shredder 7.04. The difference is estimated to be +29,
>>>where the confidence interval (95%) of the difference is +1 - +58. This means
>>>that with the probability of 97.5% Shredder 8 is stronger by at least 1 ELO
>>>point.
>>>What do you not understand here?
>>
>>Richard,
>>you got it the wrong way around. Look if you speak of INNER confidence marge
>>there is nothing you can say while if you get a higher number for a difference
>>THEN you have a significant difference in strength. Excuse me for being firm in
>>what must be said. It's just stats. Nothing
>>what I have calculated or invented. Nothing personal between the two of us I
>>hope.
>>
>>Rolf
>
>Rolf,
>
>I agree that there is something wrong here with this method. I would go about it
>assuming that they are equal strength and make a hypothesis that Elo of S8 is >
>S7 and take it from there. Forgive me for being vague on this subject, but
>Confidence intervals underline assumptions, like Normal distributions. Were we
>talking about ave. ELO differences between the progs? And how did we come up
>with Upper/Lower bounds of 58 and 1?
>Just my 2 cents worth on this topic.


You must be joking, you got it better than many experts here around. Exactly for
the same reasons you gave, I asked my simple question what the Bell curve should
be good for. What I got was "you are dumb - it's the Bell curve! Just look into
a good book about stats." This is what Hyatt called hand-waving, others call it
hot air.

But seriously. Only if you are very experienced you can directly discover when
something is shaky or totally wrong. I saw that and you too.

Rolf

>
>Cheers,
>jorge
>>
>>
>>
>>>
>>>Richard.
>>>
>>>
>>>>
>>>>>- NB you propose that the two progs are equally
>>>>>strong and then you test against it. You must top 58. [all this on the base of a
>>>>>specific N of games, the results calculated in Elo; I didn't follow the debate
>>>>>but normally you calculate with scores from the games/matches just for
>>>>>mentioning it]
>>>>>
>>>>>Rolf
>>>>>
>>>>>
>>>>>>>
>>>>>>>So if you "only" want to show that S8 is better, you can - statistically
>>>>>>>speaking - stop now. If you want to "prove" that it is more than 20 Elo points
>>>>>>>better, you need a few more games indeed...
>>>>>>>
>>>>>>>Best regards - Munjong.
>>>>>>
>>>>>>
>>>>>>
>>>>>>It's great to see that at least one guy is able to correctly interpret match
>>>>>>results here.
>>>>>>
>>>>>>I hope you will post more often on this subject. Information on it is very much
>>>>>>needed here.
>>>>>>
>>>>>>
>>>>>>
>>>>>>    Christophe



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.