Author: Uri Blass
Date: 12:20:04 12/24/00
Go up one level in this thread
On December 20, 2000 at 19:06:18, Bruce Moreland wrote: >On December 20, 2000 at 12:17:19, Uri Blass wrote: > >>I think that 25 out of 32 is more significant than 107 out of 200. > >I don't think it is a matter of opinion. > >You have two programs, A and B. They play 32 games. Each game is either won or >lost. If one side doesn't score 25 or more, you repeat. If one side scores 25 >or more, you stop and call that program stronger. > >You do the same thing with 200 games and use 107 as your stop score. > >My experiments showed that for many different rating differences, the odds of >making a mistake was about the same. For instance, if there is a rating point >difference of 25 Elo points, in the 200 case the weaker side will score at least >107 out of 200 about 7% of the time that someone does it, which will lead you to >a wrong conclusion. In the 32 case, the weaker side will score 25 about 8% of >the time that someone does it, likewise leading you to a wrong conclusion. So >your odds of a wrong conclusion are approximately the same. I found this to be >the same for many Elo point differences out to about 80 points of delta, at >which point it was hard to tell, since in both cases the weaker side almost >never gives you a false indicator. > >If anything, 107/200 seems to be a little more significant than 25/32. 1)I can say that the experiment(I will continue until the difference is 7 games is a perfect experiment if you want to find out which program is stronger because with all the results with the same difference you have the same probability to be wrong. 2)I think that there is something wrong in your calculations. I find that 25/32 is more significant based on this consideration(25/32 means difference of 18 when 107/200 means difference of 14). I will explain the reason that difference of 18 is more significant than difference of 14. If you assume that the difference between the program is constant 25/32 is always more significant than 107/200(I assume no draws to do the problem more simple) I use probability of 0.53125 for the stronger side to win. The probability to get 107/200 for the weaker side based on the binomical distribution is (0.46875^107*0.53125^93)*Choice(200,107) choice(200,107) is the number of possibility The probability to get 107/200 for the stronger side is (0.46875^93*0.53125^107)*choice(200,107) If you divide the number you can see that the ratio between the probabilities is (0.46875^14/0.53125^14) You can see that this number is not dependent in the number of games but only on the difference. This means that the probability to be wrong when you have a conclusion is dependent only on the number 107-93 and on the difference that you assume for the 2 programs when you do not know which program is stronger. Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.