Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: I will continue the match until there is a diffence of 7 games

Author: Uri Blass

Date: 02:30:03 12/21/00

Go up one level in this thread


On December 21, 2000 at 03:03:43, Bruce Moreland wrote:

>On December 20, 2000 at 20:05:39, Uri Blass wrote:
>
>>On December 20, 2000 at 19:06:18, Bruce Moreland wrote:
>>
>>>On December 20, 2000 at 12:17:19, Uri Blass wrote:
>>>
>>>>I think that 25 out of 32 is more significant than 107 out of 200.
>>>
>>>I don't think it is a matter of opinion.
>>>
>>>You have two programs, A and B.  They play 32 games.  Each game is either won or
>>>lost.  If one side doesn't score 25 or more, you repeat.  If one side scores 25
>>>or more, you stop and call that program stronger.
>>>
>>>You do the same thing with 200 games and use 107 as your stop score.
>>>
>>>My experiments showed that for many different rating differences, the odds of
>>>making a mistake was about the same.  For instance, if there is a rating point
>>>difference of 25 Elo points, in the 200 case the weaker side will score at least
>>>107 out of 200 about 7% of the time that someone does it, which will lead you to
>>>a wrong conclusion.  In the 32 case, the weaker side will score 25 about 8% of
>>>the time that someone does it, likewise leading you to a wrong conclusion.
>>
>>You are right that if you know before testing that the difference is small then
>>25-7 is not so convincing about the question which program is better and it
>>seems to be the case when programmers make an upgrade.
>>
>>In this case 25-7 for the new version is not convincing but 25-7 for the old
>>version seems to be more convincing because if you see this kind of result you
>>can suspect that the new version has a bug.
>
>I'm not sure what this means.  If you know that the distance is big, why test
>anything?  You already know the answer.

The problem is that you do not know before testing if there is a big distance or
a small distance.

The only clear thing is that programmers do not get a big progress of 200 elo in
a new version.

It is clearly possible to be 200 elo weaker because of a bug and if you see 25-7
for the old version then it is logical to suspect that it is the case because
results like 25-7 are rare in cases of a small change.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.