Author: Dann Corbit
Date: 19:30:03 01/22/04
Go up one level in this thread
On January 22, 2004 at 20:15:14, Rolf Tueschen wrote:
>On January 22, 2004 at 12:53:16, Christophe Theron wrote:
>
>>On January 21, 2004 at 20:00:12, Kolss wrote:
>>
>>>Hi,
>>>
>>>How many games you need depends on what you want to show, of course... :-)
>>>If my calculations are correct, I get the following:
>>>
>>>Shredder 8 vs. Shredder 7.04:
>>>
>>>+90 -65 =145
>>>
>>>=> 162.5 - 137.5
>>>
>>>=> 54.17 %
>>>
>>>=>
>>>Elo difference = +29
>>>95 % confidence interval: [+1, +58]
>>>
>>>That means that based on this 300-game match (for this particular time control
>>>on this particular computer with these particular settings etc.), your best
>>>guess is that S8 is 29 Elo points better than S7.04 (highest likelihood for that
>>>value); there is a 95 % chance that S8 is between 1 and 58 Elo points better;
>>>and the likelihood that S8 is (at least 1 Elo point) better than S7.04 is 97.5
>>>%.
>
>
>This is wrong. Stats doesn't work this way. In your example above 1 Elo is as
>probable as 58 Elo. There is no way to hypostate that Elo 29 is the "best"
>guess. With a defined confidence int. of 95% you get a variance of 1 to 58 Elo
>points. Then you look how your results are differing for two progs. All results
>between 1 and 58 tell you nothing about differences! You still have to admit
>that the two progs could be equally strong. You need at least Elo +-59 for a
>claim of being better or worse. - NB you propose that the two progs are equally
>strong and then you test against it. You must top 58. [all this on the base of a
>specific N of games, the results calculated in Elo; I didn't follow the debate
>but normally you calculate with scores from the games/matches just for
>mentioning it]
That would be true if the shape of the normal curve were a box. But it is a
bell shape. Now, most of the area is in the middle, and the tails are
practically nil, so the variation near the center is considerable. But the 1
ELO difference is not nearly so probable as 29. However, a difference of 20 or
34 or something like that it very probable, since the curve is nearly flat on
top.
To get the chances, just choose the distance from the center and do an
integration. For standard distances, you can do a table lookup.
Here is a crude approxmatino of a bell curve (not intended to be mathematically
perfect -- consider it a schematic):
_
s X s
| ____---|---____ |
| __/ | \__|
| / | |
+----|/-----------|----------|\----+
| | | | | |
| /| | | \ |
| / | | | \ |
| / | | | \|
|/ | | | \
_/| | | | |\_
__/ | | | | | \__
_
X is the average (for a symmetric curve like this one, also the mean and the
mode)
s is +/- one standard deviation. About 2/3 of all the curve area fits under one
standard deviation. 2 standard deviations will take up more than 95% of the
area.
Very near the average, a bell curve is pretty flat (unless it his highly
leoptokurtotic or something) and so small variations of the central tendency are
very likely.
The odds that the true figure sits in one of the tails are very slim.
Most of the programs that quote +/- figures (e.g EloStat and SSDF) use 2
standard deviations. And so any outlier would have to sit in a slim slip of a
tail indeed. Not to say it can't happen. But it is a lot less likely than
being somewhere near the central estimate.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.