Author: Richard Pijl
Date: 06:15:58 01/23/04
Go up one level in this thread
On January 23, 2004 at 06:04:43, Rolf Tueschen wrote: >On January 22, 2004 at 22:30:03, Dann Corbit wrote: > >>On January 22, 2004 at 20:15:14, Rolf Tueschen wrote: >> >>>On January 22, 2004 at 12:53:16, Christophe Theron wrote: >>> >>>>On January 21, 2004 at 20:00:12, Kolss wrote: >>>> >>>>>Hi, >>>>> >>>>>How many games you need depends on what you want to show, of course... :-) >>>>>If my calculations are correct, I get the following: >>>>> >>>>>Shredder 8 vs. Shredder 7.04: >>>>> >>>>>+90 -65 =145 >>>>> >>>>>=> 162.5 - 137.5 >>>>> >>>>>=> 54.17 % >>>>> >>>>>=> >>>>>Elo difference = +29 >>>>>95 % confidence interval: [+1, +58] >>>>> >>>>>That means that based on this 300-game match (for this particular time control >>>>>on this particular computer with these particular settings etc.), your best >>>>>guess is that S8 is 29 Elo points better than S7.04 (highest likelihood for that >>>>>value); there is a 95 % chance that S8 is between 1 and 58 Elo points better; >>>>>and the likelihood that S8 is (at least 1 Elo point) better than S7.04 is 97.5 >>>>>%. >>> >>> >>>This is wrong. Stats doesn't work this way. In your example above 1 Elo is as >>>probable as 58 Elo. There is no way to hypostate that Elo 29 is the "best" >>>guess. With a defined confidence int. of 95% you get a variance of 1 to 58 Elo >>>points. Then you look how your results are differing for two progs. All results >>>between 1 and 58 tell you nothing about differences! You still have to admit >>>that the two progs could be equally strong. You need at least Elo +-59 for a >>>claim of being better or worse. - NB you propose that the two progs are equally >>>strong and then you test against it. You must top 58. [all this on the base of a >>>specific N of games, the results calculated in Elo; I didn't follow the debate >>>but normally you calculate with scores from the games/matches just for >>>mentioning it] >> >>That would be true if the shape of the normal curve were a box. But it is a >>bell shape. Now, most of the area is in the middle, and the tails are >>practically nil, so the variation near the center is considerable. But the 1 >>ELO difference is not nearly so probable as 29. However, a difference of 20 or >>34 or something like that it very probable, since the curve is nearly flat on >>top. >> >>To get the chances, just choose the distance from the center and do an >>integration. For standard distances, you can do a table lookup. >> >>Here is a crude approxmatino of a bell curve (not intended to be mathematically >>perfect -- consider it a schematic): >> >> _ >> s X s >> >> | ____---|---____ | >> | __/ | \__| >> | / | | >> +----|/-----------|----------|\----+ >> | | | | | | >> | /| | | \ | >> | / | | | \ | >> | / | | | \| >> |/ | | | \ >> _/| | | | |\_ >> __/ | | | | | \__ >> >> >>_ >>X is the average (for a symmetric curve like this one, also the mean and the >>mode) >> >>s is +/- one standard deviation. About 2/3 of all the curve area fits under one >>standard deviation. 2 standard deviations will take up more than 95% of the >>area. >> >>Very near the average, a bell curve is pretty flat (unless it his highly >>leoptokurtotic or something) and so small variations of the central tendency are >>very likely. >> >>The odds that the true figure sits in one of the tails are very slim. >> >>Most of the programs that quote +/- figures (e.g EloStat and SSDF) use 2 >>standard deviations. And so any outlier would have to sit in a slim slip of a >>tail indeed. Not to say it can't happen. But it is a lot less likely than >>being somewhere near the central estimate. > > >Dann, >this is not yet the solution. Let's keep it simple for the average reader could >follow. BTW sensational drawing you gave. This is as simple as statistics get. Very good explanation by Dann. >Let's make it step by step. I for one know that you are a bit on the wrong side >with your message, but let's clarify this. You're being rude here. Dann put a lot of effort in trying to explain something to you. Something you obviously do not understand. Then you're telling he is on the wrong side? >We are talking about stats, right? > >Now your picture represents exactly what? (First question.) The Bell curve. Any book on elementary statistics should cover that one. Look up 'normal distribution'. Or find a relevant site on the internet with google, like: http://davidmlane.com/hyperstat/ >Second question: as you know in stats we want to avoid making assumptions that >cannot be proved because the value is varying and all "differences" could be by >chance. Right? 1. You cannot _prove_ anything with statistics, as there is always a (very small) theoretical chance that the weaker side wins everything. 2. All that you do with statistics is putting numbers to assumptions. >The first author here was talking about confidence intervalls. With that we are >in hypotheses testings. Etc. See nr.2. above. >Now we can put it together. Having said all that what was it what you tried to >show? He tried to educate you. Seems a waste of time. Richard.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.