Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: I would call this SPORT christophe !

Author: Danniel Corbit

Date: 19:58:18 08/09/98

Go up one level in this thread


On August 09, 1998 at 21:00:27, Bruce Moreland wrote:
[snip]
>You can start flipping the coin, and if at any point you get a result that
>indicates that the coin is unfair, you can stop.
I can use this to prove that every fair coin is unfair.  Flipping a coin is a
random walk.  Every Markov type process will drift at some point in time, but
over the long haul will have a particular average.  [The drifts above and below
will tend to cancel out.]

>For instance, you may decide to consider the coin to be unfair if you get a
>result that will happen less than 5% of the time if the coin is fair.
>
>If you flip the coin five times and it comes up heads all five times, the odds
>are 1/32 that this will happen with a fair coin, and this is less than 5%, so
>you can conclude that the coin isn't fair.
>
>You can say, "this coin is at least slightly more likely to come up heads than
>tails."
I don't think it demonstrates that.  If you have twenty teams follow this
method, half the teams will find coins that are unfair +heads, and half will
find coins that are unfair +tails, on average. [Of course, you could get all
+heads or all +tails, or even all heads or all tails].

>If you don't get such a dramatic result, and there are tails mixed in with the
>heads, it will take you a much longer time to find an unfair coin.  But when you
>finally do get a 95% confidence that the coin is unfair, you are not one bit
>more certain that the coin is unfair than if you flipped it five times and it
>came up heads each time.
>
>Chess isn't coin flipping.  White has a better chance of winning than black, and
>there are draws.  I don't know exactly what affect this has, but I think the
>existence of draws decreases the number of trials you need to get signficance if
>you have a wipeout result.
>
>So I think 4-0 actually turns out to be a significant result.  If you score 4-0,
>you can say that there is a very good chance that the one with the wins is
>better than the ones with the losses.
There is clearly some truth to this statement.  The question is, "What is the
confidence that this program is better?"

>You can't say this if you pick out a string of 4 wins in a row in the midst of a
>longer match, since you might be selecting a fluke case, but if you just start
>from scratch, and get 4-0, you should be able to stop.  In fact I think you
>might be able to stop if you get 3.5 - 0.5, but I am less certain of this case.
>Someone who has more statistics than I may be willing to comment on this.
>
>Now, if you do 30 games, you might think you are safe, but you are probably not.
> I'm sure you can get some results where one side wins by a few games, perhaps
>even quite a few games, and you still may not have proven with reasonable
>confidence that one program is stronger than the other.
The more trials you perform, the more probable that you have a correct result,
unless the experiment is flawed [e.g. someone might always flip the coin in
exactly the same way and have fine muscle control which influences the outcome].

>And remember, that what I'm talking about is "stronger", not "markedly
>stronger".  If you get 4-0 you don't prove that one program is hundreds of
>points stronger than the other one, just that it is at least slightly stronger.
>
>I'm sure there is a way you can say, "I have shown that there is a 95% chance
>that program A is at least 30 Elo points stronger than program B", but I'm not
>sure exactly how to do it with confidence.  And I think that in practice, in
>order to show this, program A has to beat program B pretty badly, although less
>badly if you do more trials.
There is a way to accomplish exactly this goal.  You can calculate firstly, what
the presumed measure of the strength relationship is.  This is called the
confidence interval.  With enough data points, you can create a confidence
interval with any accuracy you like.  The higher the confidence in the
relationship you want to determine, the wider the guess at the relationship will
be, but this narrows with the number of trials.  Now, there is a second measure,
called the prediction interval.  Once you have amassed a large amount of data
and understand the confidence interval, you can use the same data to predict the
outcome of an experiment.  Again, the certainty of the measure will determine
the width of the band.  So, for instance, if you want to have a 95% probable
outcome prediction instead of a 67% probability, the estimate will change from
{for example} x beats y from somewhere between (6:4)...(8:2) to x vs y will
score from (5:5) to (10:0).

Unless one program is truly dominant, it will be hard to narrow the predictions.


>Some simple statistical concepts were regarded as top secrets during World War
>II, because they allowed researches to prove that one drug was effective with
>sometimes very few trials.  Not everyone had figured this out by then.
I hope they try something a million times before they use it on me.
';-)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.