Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: If 75 Games are not considered a Statistical proof, neither is the SSDF.

Author: Uri Blass

Date: 13:14:25 01/31/01

Go up one level in this thread


On January 31, 2001 at 14:17:48, Bruce Moreland wrote:

>On January 31, 2001 at 01:25:33, Uri Blass wrote:
>
>>I disagree that the probability of error in the conclusion is higher in the
>>60-40 case.
>
>I don't think that you are allowed to disagree, since my argument has a correct
>mathematical basis.
>
>The probability of 40-60 or worse coming up in a fair coin flipping contest is
>2.84%.  The probability of 0-10 coming up is 0.98%.

The probability of 0-10 coming up in 10 games is even lower than it
(2/2^10=1/512)  but calculating these probabilities is not the question and the
question is what is the probability to be wrong when you make a decision.
>
>>I think that +60 -40 is more significant for programmers than 10-0 result
>>because the probability that the weaker side wins 60-40 after you know that
>>60-40 happened is smaller than the probability that the weaker side won 10-0
>>after you know that 10-0 happened.
>
>This is mathematically false, see above.

The question is not what is the probability to make a wrong decision but what is
to probability to be wrong when you make a decision.
>
>>assume that the weaker side has probability of p to win(p<1/2)
>>
>>The probability of 10-0 result is p^10+(1-p)^10
>>The probability of the weaker side to win 10-0 is p^10
>>
>>Conclusion:the probability of the weaker side to win 10-0 when 10-0 happened
>>is p^10/(p^10+(1-p)^10)
>>
>>The probability of 60-40 result is (p^60*(1-p)^40+P^40*(1-p)^60)*C
>>When C=100!/(40!*60!)
>>The probability of the weaker side to win 60-40 is p^60*(1-p)^40*C
>
>I believe that what you are doing is defining the probability of a result that's
>exactly 40-60.  But you can't do that, you have to account for the possiblity
>that the result will be worse, too.

I calculate the probability that the weaker side win the match after knowing
that the result is 60-40 and it is the first step.

After seeing a result of 60-40 I can ignore cases when the result is higher than
60-40.

>
>If we were to do a test where the result was a real number, we'd have fractional
>results, so the choice of a result window one unit wide is really arbitrary, so
>I believe you have to include results that are worse than 40-60.
>
>Even so, if two equal programs play, the odds that a particular one will lose
>40-60 are a little over 1%, according to your own formula (p^60*(1-p)^40*C),
>which agrees with my own result, which is still ten times better than the odds
>of an 0-10 result.
>
>>Conclusion the probability of the weaker side to win 60-40 after you know that
>>60-40 happened is p^60*(1-p)^40/(p^60*(1-p)^40+p^40*(1-p)^60)=
>>p^20/(p^20+(1-p)^20)
>>
>>I got the last equality by dividing both sides of the equation by p^40*(1-p)^40
>
>Assuming the programs are equal

If the programs are equal I get 1/2 for both cases.
The assumption is that the programs are not equal and that there is a small
difference.

, you should just be able to square both
>percentages, which results in a lower chance for 0-10 twice than for <= 40-60
>twice, which is how I'd do it, as well as a lower chance for 0-10 than for
>40-60, which is how you did it.
>
>Not that any of this matters, since the idea that you have to have a duplicate
>sub-run with an opposite result is silly.
>
>>In general we can say that the probability of the weaker side to win by a
>>difference of n after you know that the difference is n is
>>p^n/(p^n+(1-p)^n)
>>
>>It means that if you want to know only which program is stronger then the most
>>logical test is to play until the difference is n games.
>
>I don't know where you are getting this math but this doesn't make any sense to
>me.

I explained that p^n/(p^n+(1-p)^n) is the probability of the weaker side to win
after you know the result.

This situation is a practical situation because you often know the result and do
not know which version is better and you need to decide which version is better.
>
>>I think that the level of confidence here is not important because the word
>>level of confidence is misleading.
>>
>>the % of the cases that you want to get the right decision from the cases that
>>you make a decision is not the level of confidence and it is the important
>>number.
>>
>>This number is a function of p and n.
>>
>>The only case when 10-0 may be more significant is a case when you do not know p
>>so you suspect that p is bigger when you see 10-0 result but we need to know the
>>apriori distibution of p in order to decide that 10-0 is more significant.
>
>If you start a match and get 10-0 right away, it proves that p is bigger, by any
>reasonable standard of proof.

I thought about trying to figure out if a minor change to a version makes the
version better(for example if avoiding the recapture extension help Crafty)

My idea for tests to decide which version is better is to stop when there is
difference of 10(10 may be too small and you can decide to stop only when there
is a difference of 100).

I agree that 10-0 result suggest that the probability of the stronger side to
win is bigger(it means that p is smaller and I did an error when I said that p
is bigger) so you will have practically more chances to be wrong in the change
when you stop after more games(assuming that you stop in difference of 10) but
in the cases that you are wrong after a lot of games your mistake will be
smaller so if you want to decide by a lot of games the best test to do seems to
me to  stop after you see a difference of a constant(Constant of 10 seems too
small and I suggest a constant of 100)

I understand that there is a problem that tuning parameters against yourself may
be not productive against other opponents but I believe that in most of the
cases it will be productive against other opponents.

Tuning against yourself may suggest throwing knowledge that you do not use in
games against yourself but if you test only changing parameters like extensions
or small changes in the value of pieces then I think that changes that are
productives against yourself are usually productive against other players.

I think that building a positional test suite and using it can be practically
more productive so the idea of playing a lot of games is not the best idea to
tune parameters.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.