Author: Bruce Moreland
Date: 11:17:48 01/31/01
Go up one level in this thread
On January 31, 2001 at 01:25:33, Uri Blass wrote: >I disagree that the probability of error in the conclusion is higher in the >60-40 case. I don't think that you are allowed to disagree, since my argument has a correct mathematical basis. The probability of 40-60 or worse coming up in a fair coin flipping contest is 2.84%. The probability of 0-10 coming up is 0.98%. >I think that +60 -40 is more significant for programmers than 10-0 result >because the probability that the weaker side wins 60-40 after you know that >60-40 happened is smaller than the probability that the weaker side won 10-0 >after you know that 10-0 happened. This is mathematically false, see above. >assume that the weaker side has probability of p to win(p<1/2) > >The probability of 10-0 result is p^10+(1-p)^10 >The probability of the weaker side to win 10-0 is p^10 > >Conclusion:the probability of the weaker side to win 10-0 when 10-0 happened >is p^10/(p^10+(1-p)^10) > >The probability of 60-40 result is (p^60*(1-p)^40+P^40*(1-p)^60)*C >When C=100!/(40!*60!) >The probability of the weaker side to win 60-40 is p^60*(1-p)^40*C I believe that what you are doing is defining the probability of a result that's exactly 40-60. But you can't do that, you have to account for the possiblity that the result will be worse, too. If we were to do a test where the result was a real number, we'd have fractional results, so the choice of a result window one unit wide is really arbitrary, so I believe you have to include results that are worse than 40-60. Even so, if two equal programs play, the odds that a particular one will lose 40-60 are a little over 1%, according to your own formula (p^60*(1-p)^40*C), which agrees with my own result, which is still ten times better than the odds of an 0-10 result. >Conclusion the probability of the weaker side to win 60-40 after you know that >60-40 happened is p^60*(1-p)^40/(p^60*(1-p)^40+p^40*(1-p)^60)= >p^20/(p^20+(1-p)^20) > >I got the last equality by dividing both sides of the equation by p^40*(1-p)^40 Assuming the programs are equal, you should just be able to square both percentages, which results in a lower chance for 0-10 twice than for <= 40-60 twice, which is how I'd do it, as well as a lower chance for 0-10 than for 40-60, which is how you did it. Not that any of this matters, since the idea that you have to have a duplicate sub-run with an opposite result is silly. >In general we can say that the probability of the weaker side to win by a >difference of n after you know that the difference is n is >p^n/(p^n+(1-p)^n) > >It means that if you want to know only which program is stronger then the most >logical test is to play until the difference is n games. I don't know where you are getting this math but this doesn't make any sense to me. >I think that the level of confidence here is not important because the word >level of confidence is misleading. > >the % of the cases that you want to get the right decision from the cases that >you make a decision is not the level of confidence and it is the important >number. > >This number is a function of p and n. > >The only case when 10-0 may be more significant is a case when you do not know p >so you suspect that p is bigger when you see 10-0 result but we need to know the >apriori distibution of p in order to decide that 10-0 is more significant. If you start a match and get 10-0 right away, it proves that p is bigger, by any reasonable standard of proof. Of course, for those of you who are going to take two identical programs, and play 10-game matches until you get a 10-0 result, and declare that I'm an idiot because of couse neither program is better than the other one, all you are doing is rolling dice until you get a rare result, which proves nothing. I haven't done any investigation of the following notion, but I think that doing this with two programs that are too similar is also nonsense (for example, trying to figure out if a minor change to a version makes the version better). I don't know how to quantify that, but clearly it is an issue. bruce >Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.