Author: Uri Blass
Date: 13:14:25 01/31/01
Go up one level in this thread
On January 31, 2001 at 14:17:48, Bruce Moreland wrote: >On January 31, 2001 at 01:25:33, Uri Blass wrote: > >>I disagree that the probability of error in the conclusion is higher in the >>60-40 case. > >I don't think that you are allowed to disagree, since my argument has a correct >mathematical basis. > >The probability of 40-60 or worse coming up in a fair coin flipping contest is >2.84%. The probability of 0-10 coming up is 0.98%. The probability of 0-10 coming up in 10 games is even lower than it (2/2^10=1/512) but calculating these probabilities is not the question and the question is what is the probability to be wrong when you make a decision. > >>I think that +60 -40 is more significant for programmers than 10-0 result >>because the probability that the weaker side wins 60-40 after you know that >>60-40 happened is smaller than the probability that the weaker side won 10-0 >>after you know that 10-0 happened. > >This is mathematically false, see above. The question is not what is the probability to make a wrong decision but what is to probability to be wrong when you make a decision. > >>assume that the weaker side has probability of p to win(p<1/2) >> >>The probability of 10-0 result is p^10+(1-p)^10 >>The probability of the weaker side to win 10-0 is p^10 >> >>Conclusion:the probability of the weaker side to win 10-0 when 10-0 happened >>is p^10/(p^10+(1-p)^10) >> >>The probability of 60-40 result is (p^60*(1-p)^40+P^40*(1-p)^60)*C >>When C=100!/(40!*60!) >>The probability of the weaker side to win 60-40 is p^60*(1-p)^40*C > >I believe that what you are doing is defining the probability of a result that's >exactly 40-60. But you can't do that, you have to account for the possiblity >that the result will be worse, too. I calculate the probability that the weaker side win the match after knowing that the result is 60-40 and it is the first step. After seeing a result of 60-40 I can ignore cases when the result is higher than 60-40. > >If we were to do a test where the result was a real number, we'd have fractional >results, so the choice of a result window one unit wide is really arbitrary, so >I believe you have to include results that are worse than 40-60. > >Even so, if two equal programs play, the odds that a particular one will lose >40-60 are a little over 1%, according to your own formula (p^60*(1-p)^40*C), >which agrees with my own result, which is still ten times better than the odds >of an 0-10 result. > >>Conclusion the probability of the weaker side to win 60-40 after you know that >>60-40 happened is p^60*(1-p)^40/(p^60*(1-p)^40+p^40*(1-p)^60)= >>p^20/(p^20+(1-p)^20) >> >>I got the last equality by dividing both sides of the equation by p^40*(1-p)^40 > >Assuming the programs are equal If the programs are equal I get 1/2 for both cases. The assumption is that the programs are not equal and that there is a small difference. , you should just be able to square both >percentages, which results in a lower chance for 0-10 twice than for <= 40-60 >twice, which is how I'd do it, as well as a lower chance for 0-10 than for >40-60, which is how you did it. > >Not that any of this matters, since the idea that you have to have a duplicate >sub-run with an opposite result is silly. > >>In general we can say that the probability of the weaker side to win by a >>difference of n after you know that the difference is n is >>p^n/(p^n+(1-p)^n) >> >>It means that if you want to know only which program is stronger then the most >>logical test is to play until the difference is n games. > >I don't know where you are getting this math but this doesn't make any sense to >me. I explained that p^n/(p^n+(1-p)^n) is the probability of the weaker side to win after you know the result. This situation is a practical situation because you often know the result and do not know which version is better and you need to decide which version is better. > >>I think that the level of confidence here is not important because the word >>level of confidence is misleading. >> >>the % of the cases that you want to get the right decision from the cases that >>you make a decision is not the level of confidence and it is the important >>number. >> >>This number is a function of p and n. >> >>The only case when 10-0 may be more significant is a case when you do not know p >>so you suspect that p is bigger when you see 10-0 result but we need to know the >>apriori distibution of p in order to decide that 10-0 is more significant. > >If you start a match and get 10-0 right away, it proves that p is bigger, by any >reasonable standard of proof. I thought about trying to figure out if a minor change to a version makes the version better(for example if avoiding the recapture extension help Crafty) My idea for tests to decide which version is better is to stop when there is difference of 10(10 may be too small and you can decide to stop only when there is a difference of 100). I agree that 10-0 result suggest that the probability of the stronger side to win is bigger(it means that p is smaller and I did an error when I said that p is bigger) so you will have practically more chances to be wrong in the change when you stop after more games(assuming that you stop in difference of 10) but in the cases that you are wrong after a lot of games your mistake will be smaller so if you want to decide by a lot of games the best test to do seems to me to stop after you see a difference of a constant(Constant of 10 seems too small and I suggest a constant of 100) I understand that there is a problem that tuning parameters against yourself may be not productive against other opponents but I believe that in most of the cases it will be productive against other opponents. Tuning against yourself may suggest throwing knowledge that you do not use in games against yourself but if you test only changing parameters like extensions or small changes in the value of pieces then I think that changes that are productives against yourself are usually productive against other players. I think that building a positional test suite and using it can be practically more productive so the idea of playing a lot of games is not the best idea to tune parameters. Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.