Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Some stats...

Author: Rolf Tueschen

Date: 06:35:53 01/23/04

Go up one level in this thread


On January 23, 2004 at 09:15:58, Richard Pijl wrote:

>On January 23, 2004 at 06:04:43, Rolf Tueschen wrote:
>
>>On January 22, 2004 at 22:30:03, Dann Corbit wrote:
>>
>>>On January 22, 2004 at 20:15:14, Rolf Tueschen wrote:
>>>
>>>>On January 22, 2004 at 12:53:16, Christophe Theron wrote:
>>>>
>>>>>On January 21, 2004 at 20:00:12, Kolss wrote:
>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>How many games you need depends on what you want to show, of course... :-)
>>>>>>If my calculations are correct, I get the following:
>>>>>>
>>>>>>Shredder 8 vs. Shredder 7.04:
>>>>>>
>>>>>>+90 -65 =145
>>>>>>
>>>>>>=> 162.5 - 137.5
>>>>>>
>>>>>>=> 54.17 %
>>>>>>
>>>>>>=>
>>>>>>Elo difference = +29
>>>>>>95 % confidence interval: [+1, +58]
>>>>>>
>>>>>>That means that based on this 300-game match (for this particular time control
>>>>>>on this particular computer with these particular settings etc.), your best
>>>>>>guess is that S8 is 29 Elo points better than S7.04 (highest likelihood for that
>>>>>>value); there is a 95 % chance that S8 is between 1 and 58 Elo points better;
>>>>>>and the likelihood that S8 is (at least 1 Elo point) better than S7.04 is 97.5
>>>>>>%.
>>>>
>>>>
>>>>This is wrong. Stats doesn't work this way. In your example above 1 Elo is as
>>>>probable as 58 Elo. There is no way to hypostate that Elo 29 is the "best"
>>>>guess. With a defined confidence int. of 95% you get a variance of 1 to 58 Elo
>>>>points. Then you look how your results are differing for two progs. All results
>>>>between 1 and 58 tell you nothing about differences! You still have to admit
>>>>that the two progs could be equally strong. You need at least Elo +-59 for a
>>>>claim of being better or worse. - NB you propose that the two progs are equally
>>>>strong and then you test against it. You must top 58. [all this on the base of a
>>>>specific N of games, the results calculated in Elo; I didn't follow the debate
>>>>but normally you calculate with scores from the games/matches just for
>>>>mentioning it]
>>>
>>>That would be true if the shape of the normal curve were a box.  But it is a
>>>bell shape.  Now, most of the area is in the middle, and the tails are
>>>practically nil, so the variation near the center is considerable.  But the 1
>>>ELO difference is not nearly so probable as 29.  However, a difference of 20 or
>>>34 or something like that it very probable, since the curve is nearly flat on
>>>top.
>>>
>>>To get the chances, just choose the distance from the center and do an
>>>integration.  For standard distances, you can do a table lookup.
>>>
>>>Here is a crude approxmatino of a bell curve (not intended to be mathematically
>>>perfect -- consider it a schematic):
>>>
>>>                         _
>>>            s            X          s
>>>
>>>            |     ____---|---____   |
>>>            |  __/       |       \__|
>>>            | /          |          |
>>>       +----|/-----------|----------|\----+
>>>       |    |            |          | |   |
>>>       |   /|            |          |  \  |
>>>       |  / |            |          |   \ |
>>>       | /  |            |          |    \|
>>>       |/   |            |          |     \
>>>     _/|    |            |          |     |\_
>>>  __/  |    |            |          |     |  \__
>>>
>>>
>>>_
>>>X is the average (for a symmetric curve like this one, also the mean and the
>>>mode)
>>>
>>>s is +/- one standard deviation.  About 2/3 of all the curve area fits under one
>>>standard deviation.  2 standard deviations will take up more than 95% of the
>>>area.
>>>
>>>Very near the average, a bell curve is pretty flat (unless it his highly
>>>leoptokurtotic or something) and so small variations of the central tendency are
>>>very likely.
>>>
>>>The odds that the true figure sits in one of the tails are very slim.
>>>
>>>Most of the programs that quote +/- figures (e.g EloStat and SSDF) use 2
>>>standard deviations.  And so any outlier would have to sit in a slim slip of a
>>>tail indeed.  Not to say it can't happen.  But it is a lot less likely than
>>>being somewhere near the central estimate.
>>
>>
>>Dann,
>>this is not yet the solution. Let's keep it simple for the average reader could
>>follow. BTW sensational drawing you gave.
>
>This is as simple as statistics get. Very good explanation by Dann.
>
>>Let's make it step by step. I for one know that you are a bit on the wrong side
>>with your message, but let's clarify this.
>
>You're being rude here. Dann put a lot of effort in trying to explain something
>to you. Something you obviously do not understand. Then you're telling he is on
>the wrong side?
>
>>We are talking about stats, right?
>>
>>Now your picture represents exactly what? (First question.)
>
>The Bell curve. Any book on elementary statistics should cover that one. Look up
>'normal distribution'. Or find a relevant site on the internet with google,
>like: http://davidmlane.com/hyperstat/

Funny. You don't understand what I'm asking for. I repeat: what this picture,
yes, it's called the Bell curve, tztz, is standing for? Can you understand me?
Then please tell me what Dann had in mind. I didn't ask how the picture was
called or such, but what it should show. But thanks for your message anyway.

>
>>Second question: as you know in stats we want to avoid making assumptions that
>>cannot be proved because the value is varying and all "differences" could be by
>>chance. Right?
>
>1. You cannot _prove_ anything with statistics, as there is always a (very
>small) theoretical chance that the weaker side wins everything.
>2. All that you do with statistics is putting numbers to assumptions.
>
>>The first author here was talking about confidence intervalls. With that we are
>>in hypotheses testings. Etc.
>
>See nr.2. above.
>
>>Now we can put it together. Having said all that what was it what you tried to
>>show?
>
>He tried to educate you. Seems a waste of time.


I see. It is interesting that you know exactly what he was trying but you didn't
get the meaning of my question either.

You can call it a waste of time.

Rolf


>Richard.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.