Author: Rolf Tueschen
Date: 03:04:43 01/23/04
Go up one level in this thread
On January 22, 2004 at 22:30:03, Dann Corbit wrote: >On January 22, 2004 at 20:15:14, Rolf Tueschen wrote: > >>On January 22, 2004 at 12:53:16, Christophe Theron wrote: >> >>>On January 21, 2004 at 20:00:12, Kolss wrote: >>> >>>>Hi, >>>> >>>>How many games you need depends on what you want to show, of course... :-) >>>>If my calculations are correct, I get the following: >>>> >>>>Shredder 8 vs. Shredder 7.04: >>>> >>>>+90 -65 =145 >>>> >>>>=> 162.5 - 137.5 >>>> >>>>=> 54.17 % >>>> >>>>=> >>>>Elo difference = +29 >>>>95 % confidence interval: [+1, +58] >>>> >>>>That means that based on this 300-game match (for this particular time control >>>>on this particular computer with these particular settings etc.), your best >>>>guess is that S8 is 29 Elo points better than S7.04 (highest likelihood for that >>>>value); there is a 95 % chance that S8 is between 1 and 58 Elo points better; >>>>and the likelihood that S8 is (at least 1 Elo point) better than S7.04 is 97.5 >>>>%. >> >> >>This is wrong. Stats doesn't work this way. In your example above 1 Elo is as >>probable as 58 Elo. There is no way to hypostate that Elo 29 is the "best" >>guess. With a defined confidence int. of 95% you get a variance of 1 to 58 Elo >>points. Then you look how your results are differing for two progs. All results >>between 1 and 58 tell you nothing about differences! You still have to admit >>that the two progs could be equally strong. You need at least Elo +-59 for a >>claim of being better or worse. - NB you propose that the two progs are equally >>strong and then you test against it. You must top 58. [all this on the base of a >>specific N of games, the results calculated in Elo; I didn't follow the debate >>but normally you calculate with scores from the games/matches just for >>mentioning it] > >That would be true if the shape of the normal curve were a box. But it is a >bell shape. Now, most of the area is in the middle, and the tails are >practically nil, so the variation near the center is considerable. But the 1 >ELO difference is not nearly so probable as 29. However, a difference of 20 or >34 or something like that it very probable, since the curve is nearly flat on >top. > >To get the chances, just choose the distance from the center and do an >integration. For standard distances, you can do a table lookup. > >Here is a crude approxmatino of a bell curve (not intended to be mathematically >perfect -- consider it a schematic): > > _ > s X s > > | ____---|---____ | > | __/ | \__| > | / | | > +----|/-----------|----------|\----+ > | | | | | | > | /| | | \ | > | / | | | \ | > | / | | | \| > |/ | | | \ > _/| | | | |\_ > __/ | | | | | \__ > > >_ >X is the average (for a symmetric curve like this one, also the mean and the >mode) > >s is +/- one standard deviation. About 2/3 of all the curve area fits under one >standard deviation. 2 standard deviations will take up more than 95% of the >area. > >Very near the average, a bell curve is pretty flat (unless it his highly >leoptokurtotic or something) and so small variations of the central tendency are >very likely. > >The odds that the true figure sits in one of the tails are very slim. > >Most of the programs that quote +/- figures (e.g EloStat and SSDF) use 2 >standard deviations. And so any outlier would have to sit in a slim slip of a >tail indeed. Not to say it can't happen. But it is a lot less likely than >being somewhere near the central estimate. Dann, this is not yet the solution. Let's keep it simple for the average reader could follow. BTW sensational drawing you gave. Let's make it step by step. I for one know that you are a bit on the wrong side with your message, but let's clarify this. We are talking about stats, right? Now your picture represents exactly what? (First question.) Second question: as you know in stats we want to avoid making assumptions that cannot be proved because the value is varying and all "differences" could be by chance. Right? The first author here was talking about confidence intervalls. With that we are in hypotheses testings. Etc. Now we can put it together. Having said all that what was it what you tried to show? Rolf
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.