Author: Rolf Tueschen
Date: 06:35:53 01/23/04
Go up one level in this thread
On January 23, 2004 at 09:15:58, Richard Pijl wrote: >On January 23, 2004 at 06:04:43, Rolf Tueschen wrote: > >>On January 22, 2004 at 22:30:03, Dann Corbit wrote: >> >>>On January 22, 2004 at 20:15:14, Rolf Tueschen wrote: >>> >>>>On January 22, 2004 at 12:53:16, Christophe Theron wrote: >>>> >>>>>On January 21, 2004 at 20:00:12, Kolss wrote: >>>>> >>>>>>Hi, >>>>>> >>>>>>How many games you need depends on what you want to show, of course... :-) >>>>>>If my calculations are correct, I get the following: >>>>>> >>>>>>Shredder 8 vs. Shredder 7.04: >>>>>> >>>>>>+90 -65 =145 >>>>>> >>>>>>=> 162.5 - 137.5 >>>>>> >>>>>>=> 54.17 % >>>>>> >>>>>>=> >>>>>>Elo difference = +29 >>>>>>95 % confidence interval: [+1, +58] >>>>>> >>>>>>That means that based on this 300-game match (for this particular time control >>>>>>on this particular computer with these particular settings etc.), your best >>>>>>guess is that S8 is 29 Elo points better than S7.04 (highest likelihood for that >>>>>>value); there is a 95 % chance that S8 is between 1 and 58 Elo points better; >>>>>>and the likelihood that S8 is (at least 1 Elo point) better than S7.04 is 97.5 >>>>>>%. >>>> >>>> >>>>This is wrong. Stats doesn't work this way. In your example above 1 Elo is as >>>>probable as 58 Elo. There is no way to hypostate that Elo 29 is the "best" >>>>guess. With a defined confidence int. of 95% you get a variance of 1 to 58 Elo >>>>points. Then you look how your results are differing for two progs. All results >>>>between 1 and 58 tell you nothing about differences! You still have to admit >>>>that the two progs could be equally strong. You need at least Elo +-59 for a >>>>claim of being better or worse. - NB you propose that the two progs are equally >>>>strong and then you test against it. You must top 58. [all this on the base of a >>>>specific N of games, the results calculated in Elo; I didn't follow the debate >>>>but normally you calculate with scores from the games/matches just for >>>>mentioning it] >>> >>>That would be true if the shape of the normal curve were a box. But it is a >>>bell shape. Now, most of the area is in the middle, and the tails are >>>practically nil, so the variation near the center is considerable. But the 1 >>>ELO difference is not nearly so probable as 29. However, a difference of 20 or >>>34 or something like that it very probable, since the curve is nearly flat on >>>top. >>> >>>To get the chances, just choose the distance from the center and do an >>>integration. For standard distances, you can do a table lookup. >>> >>>Here is a crude approxmatino of a bell curve (not intended to be mathematically >>>perfect -- consider it a schematic): >>> >>> _ >>> s X s >>> >>> | ____---|---____ | >>> | __/ | \__| >>> | / | | >>> +----|/-----------|----------|\----+ >>> | | | | | | >>> | /| | | \ | >>> | / | | | \ | >>> | / | | | \| >>> |/ | | | \ >>> _/| | | | |\_ >>> __/ | | | | | \__ >>> >>> >>>_ >>>X is the average (for a symmetric curve like this one, also the mean and the >>>mode) >>> >>>s is +/- one standard deviation. About 2/3 of all the curve area fits under one >>>standard deviation. 2 standard deviations will take up more than 95% of the >>>area. >>> >>>Very near the average, a bell curve is pretty flat (unless it his highly >>>leoptokurtotic or something) and so small variations of the central tendency are >>>very likely. >>> >>>The odds that the true figure sits in one of the tails are very slim. >>> >>>Most of the programs that quote +/- figures (e.g EloStat and SSDF) use 2 >>>standard deviations. And so any outlier would have to sit in a slim slip of a >>>tail indeed. Not to say it can't happen. But it is a lot less likely than >>>being somewhere near the central estimate. >> >> >>Dann, >>this is not yet the solution. Let's keep it simple for the average reader could >>follow. BTW sensational drawing you gave. > >This is as simple as statistics get. Very good explanation by Dann. > >>Let's make it step by step. I for one know that you are a bit on the wrong side >>with your message, but let's clarify this. > >You're being rude here. Dann put a lot of effort in trying to explain something >to you. Something you obviously do not understand. Then you're telling he is on >the wrong side? > >>We are talking about stats, right? >> >>Now your picture represents exactly what? (First question.) > >The Bell curve. Any book on elementary statistics should cover that one. Look up >'normal distribution'. Or find a relevant site on the internet with google, >like: http://davidmlane.com/hyperstat/ Funny. You don't understand what I'm asking for. I repeat: what this picture, yes, it's called the Bell curve, tztz, is standing for? Can you understand me? Then please tell me what Dann had in mind. I didn't ask how the picture was called or such, but what it should show. But thanks for your message anyway. > >>Second question: as you know in stats we want to avoid making assumptions that >>cannot be proved because the value is varying and all "differences" could be by >>chance. Right? > >1. You cannot _prove_ anything with statistics, as there is always a (very >small) theoretical chance that the weaker side wins everything. >2. All that you do with statistics is putting numbers to assumptions. > >>The first author here was talking about confidence intervalls. With that we are >>in hypotheses testings. Etc. > >See nr.2. above. > >>Now we can put it together. Having said all that what was it what you tried to >>show? > >He tried to educate you. Seems a waste of time. I see. It is interesting that you know exactly what he was trying but you didn't get the meaning of my question either. You can call it a waste of time. Rolf >Richard.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.