Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: I will continue the match until there is a diffence of 7 games

Author: Uri Blass

Date: 13:45:00 12/21/00

Go up one level in this thread


On December 21, 2000 at 15:13:56, Leen Ammeraal wrote:

>On December 20, 2000 at 12:45:39, Christophe Theron wrote:
>
>> ....
>>
>>Your rule of stopping when you get one of the "significant" results you have
>>listed says approximately the same thing as my "reliability of matches" table.
>>
>>The main point is that, for a given confidence, you can compute a table giving
>>the smallest winning percentage depending of number of games played which is
>>enough to say that the match is significant (once again: within the chosen
>>confidence level).
>>
>>This table should definitely be published in the CCC resource center.
>>
>>The problem is that computing this table is not easy, at least for me. You have
>>to know the relevant formulas, and I actually do not know them.
>>
>>The table I have given in this thread has been computed by numerical simulation,
>>with a program that I have written myself. Each line of the table has been
>>computed, IIRC, with 10000 simulated occurences of the given matches. The
>>confidence level and margin of errors have been found by trial-and-error, that's
>>why there are accurate to the first decimal only. And also, I'm not 100% sure
>>that the random number generator I have used is reliable enough for this kind of
>>experiment.
>>
>>Even this dirty computed table is already extremely useful for me. First of all
>>I have learned a lot about the reliability of matches while I was building it.
>> ...
>>
>>
>>    Christophe
>
>
>Is this table of yours somewhere available online?
>I missed it and I am very much interested in it.
>
>Leen Ammeraal

The main problem is that the level of confidence is misleading.

If you want to get a situation when 80% of your changes are good changes then
getting 80% confidence is not enough.

You test the following 2 conjectures:
H0: The new program is not better than the old program
H1: The new program is better than the old program

80% confidence means that the probability to reject H0 when H0 is right is at
most 20%(20% is in the case when the programs are equal)

It means that in at least 80% of the cases when a bad change is suggested you
will avoid doing it but it does not mean that at least 80% of your changes are
good changes.

You need another test if you want to know that at least 80% of the changes are
good changes.

You need to define your assumptions about the size of the changes in order to
find the right test for the problem(If you believe that the size of the changes
is very small you need a lot of games before doing a change if you want to be
correct in 80% of your changes and if you believe that the changes are bigger
then you need less games)

I believe that testing by doing a lot of games to decide about a change is not
the best way to decide about a change because most of the changes are small and
you will be busy in doing games and not in doing changes.

There are cases when it is clear that a change is positive without playing
games(for example if you do your program 1% faster in every position then it is
clear that the change is posiitve and testing the program in a lot of positions
to see that there is no bug is enough).

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.