Author: Uri Blass
Date: 17:23:25 12/20/00
Go up one level in this thread
On December 20, 2000 at 20:05:39, Uri Blass wrote: >On December 20, 2000 at 19:06:18, Bruce Moreland wrote: > >>On December 20, 2000 at 12:17:19, Uri Blass wrote: >> >>>I think that 25 out of 32 is more significant than 107 out of 200. >> >>I don't think it is a matter of opinion. >> >>You have two programs, A and B. They play 32 games. Each game is either won or >>lost. If one side doesn't score 25 or more, you repeat. If one side scores 25 >>or more, you stop and call that program stronger. >> >>You do the same thing with 200 games and use 107 as your stop score. >> >>My experiments showed that for many different rating differences, the odds of >>making a mistake was about the same. For instance, if there is a rating point >>difference of 25 Elo points, in the 200 case the weaker side will score at least >>107 out of 200 about 7% of the time that someone does it, which will lead you to >>a wrong conclusion. In the 32 case, the weaker side will score 25 about 8% of >>the time that someone does it, likewise leading you to a wrong conclusion. > >You are right that if you know before testing that the difference is small then >25-7 is not so convincing about the question which program is better and it >seems to be the case when programmers make an upgrade. > >In this case 25-7 for the new version is not convincing but 25-7 for the old >version seems to be more convincing because if you see this kind of result you >can suspect that the new version has a bug. > >Practically if I see 25-7 results between different programs I suspect that the >difference is clearly bigger than 200 elo so the results seem very convincing to >me because I do not have an opinion that the difference is small before testing. > >Uri I can add that the 95% confidence is misleading. It means that you will be wrong in only 5% of the cases by declaring that the weaker program is the stronger program but it does not mean that in 95% of the case when you have a decision you will get the right decision. Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.