Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Some stats...

Author: Richard Pijl

Date: 06:15:58 01/23/04

Go up one level in this thread


On January 23, 2004 at 06:04:43, Rolf Tueschen wrote:

>On January 22, 2004 at 22:30:03, Dann Corbit wrote:
>
>>On January 22, 2004 at 20:15:14, Rolf Tueschen wrote:
>>
>>>On January 22, 2004 at 12:53:16, Christophe Theron wrote:
>>>
>>>>On January 21, 2004 at 20:00:12, Kolss wrote:
>>>>
>>>>>Hi,
>>>>>
>>>>>How many games you need depends on what you want to show, of course... :-)
>>>>>If my calculations are correct, I get the following:
>>>>>
>>>>>Shredder 8 vs. Shredder 7.04:
>>>>>
>>>>>+90 -65 =145
>>>>>
>>>>>=> 162.5 - 137.5
>>>>>
>>>>>=> 54.17 %
>>>>>
>>>>>=>
>>>>>Elo difference = +29
>>>>>95 % confidence interval: [+1, +58]
>>>>>
>>>>>That means that based on this 300-game match (for this particular time control
>>>>>on this particular computer with these particular settings etc.), your best
>>>>>guess is that S8 is 29 Elo points better than S7.04 (highest likelihood for that
>>>>>value); there is a 95 % chance that S8 is between 1 and 58 Elo points better;
>>>>>and the likelihood that S8 is (at least 1 Elo point) better than S7.04 is 97.5
>>>>>%.
>>>
>>>
>>>This is wrong. Stats doesn't work this way. In your example above 1 Elo is as
>>>probable as 58 Elo. There is no way to hypostate that Elo 29 is the "best"
>>>guess. With a defined confidence int. of 95% you get a variance of 1 to 58 Elo
>>>points. Then you look how your results are differing for two progs. All results
>>>between 1 and 58 tell you nothing about differences! You still have to admit
>>>that the two progs could be equally strong. You need at least Elo +-59 for a
>>>claim of being better or worse. - NB you propose that the two progs are equally
>>>strong and then you test against it. You must top 58. [all this on the base of a
>>>specific N of games, the results calculated in Elo; I didn't follow the debate
>>>but normally you calculate with scores from the games/matches just for
>>>mentioning it]
>>
>>That would be true if the shape of the normal curve were a box.  But it is a
>>bell shape.  Now, most of the area is in the middle, and the tails are
>>practically nil, so the variation near the center is considerable.  But the 1
>>ELO difference is not nearly so probable as 29.  However, a difference of 20 or
>>34 or something like that it very probable, since the curve is nearly flat on
>>top.
>>
>>To get the chances, just choose the distance from the center and do an
>>integration.  For standard distances, you can do a table lookup.
>>
>>Here is a crude approxmatino of a bell curve (not intended to be mathematically
>>perfect -- consider it a schematic):
>>
>>                         _
>>            s            X          s
>>
>>            |     ____---|---____   |
>>            |  __/       |       \__|
>>            | /          |          |
>>       +----|/-----------|----------|\----+
>>       |    |            |          | |   |
>>       |   /|            |          |  \  |
>>       |  / |            |          |   \ |
>>       | /  |            |          |    \|
>>       |/   |            |          |     \
>>     _/|    |            |          |     |\_
>>  __/  |    |            |          |     |  \__
>>
>>
>>_
>>X is the average (for a symmetric curve like this one, also the mean and the
>>mode)
>>
>>s is +/- one standard deviation.  About 2/3 of all the curve area fits under one
>>standard deviation.  2 standard deviations will take up more than 95% of the
>>area.
>>
>>Very near the average, a bell curve is pretty flat (unless it his highly
>>leoptokurtotic or something) and so small variations of the central tendency are
>>very likely.
>>
>>The odds that the true figure sits in one of the tails are very slim.
>>
>>Most of the programs that quote +/- figures (e.g EloStat and SSDF) use 2
>>standard deviations.  And so any outlier would have to sit in a slim slip of a
>>tail indeed.  Not to say it can't happen.  But it is a lot less likely than
>>being somewhere near the central estimate.
>
>
>Dann,
>this is not yet the solution. Let's keep it simple for the average reader could
>follow. BTW sensational drawing you gave.

This is as simple as statistics get. Very good explanation by Dann.

>Let's make it step by step. I for one know that you are a bit on the wrong side
>with your message, but let's clarify this.

You're being rude here. Dann put a lot of effort in trying to explain something
to you. Something you obviously do not understand. Then you're telling he is on
the wrong side?

>We are talking about stats, right?
>
>Now your picture represents exactly what? (First question.)

The Bell curve. Any book on elementary statistics should cover that one. Look up
'normal distribution'. Or find a relevant site on the internet with google,
like: http://davidmlane.com/hyperstat/

>Second question: as you know in stats we want to avoid making assumptions that
>cannot be proved because the value is varying and all "differences" could be by
>chance. Right?

1. You cannot _prove_ anything with statistics, as there is always a (very
small) theoretical chance that the weaker side wins everything.
2. All that you do with statistics is putting numbers to assumptions.

>The first author here was talking about confidence intervalls. With that we are
>in hypotheses testings. Etc.

See nr.2. above.

>Now we can put it together. Having said all that what was it what you tried to
>show?

He tried to educate you. Seems a waste of time.
Richard.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.