Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: I will continue the match until there is a diffence of 7 games

Author: Bruce Moreland

Date: 16:06:18 12/20/00

Go up one level in this thread


On December 20, 2000 at 12:17:19, Uri Blass wrote:

>I think that 25 out of 32 is more significant than 107 out of 200.

I don't think it is a matter of opinion.

You have two programs, A and B.  They play 32 games.  Each game is either won or
lost.  If one side doesn't score 25 or more, you repeat.  If one side scores 25
or more, you stop and call that program stronger.

You do the same thing with 200 games and use 107 as your stop score.

My experiments showed that for many different rating differences, the odds of
making a mistake was about the same.  For instance, if there is a rating point
difference of 25 Elo points, in the 200 case the weaker side will score at least
107 out of 200 about 7% of the time that someone does it, which will lead you to
a wrong conclusion.  In the 32 case, the weaker side will score 25 about 8% of
the time that someone does it, likewise leading you to a wrong conclusion.  So
your odds of a wrong conclusion are approximately the same.  I found this to be
the same for many Elo point differences out to about 80 points of delta, at
which point it was hard to tell, since in both cases the weaker side almost
never gives you a false indicator.

If anything, 107/200 seems to be a little more significant than 25/32.

>It is logical to do an experiment without a fixed number of games to decide
>which program is stronger but I think that the rule to stop when the difference
>is 7 games is not a good rule.

It could be very good if you get to 9-2.  I don't know, I didn't check.  It
would be very bad if you are talking about 504-497.  What this amounts to, like
I said in my other post, is trying to determine which is better, until you
decide you can't tell, at which point you just pick one arbitrarily.

I think this is probably a fine way to figure out which one is better, as long
as you admit that you don't know, after a certain point, rather than saying that
you do know.  In fact, this means of statistical analysis was considered a top
secret by the US government during WWII, when testing medicines.

The idea is that they would run trials until they determined that the result up
until then is unlikely to have been the result of chance.  This might not take
very long, if they had a lot of succeses early on, or it might take a long time
if it's hard to tell.  But the difference between what you are saying and what
was considered to be an important secret, is that they eventually reached a
point beyond which they stopped.  That point might be soon if it looked like the
result was probably due to chance, or it could be further out if it looked like
the result may be due to the drug's effectiveness.

I could explain in more detail but it would require a diagram.

>A better rule is to stop if you get one of the following result without counting
>draws
>
>5-0,7-1,9-2,10-3,12-4,13-5,15-6,16-7,18-8,19-9,20-10,22-11,23-12,24-13,26-14
>27-15,28-16,30-17,31-18,32-19,33-20,35-21,36-22,37-23

5-0 isn't good enough.  With the previously mentioned 25 Elo point delta, if
someone wins a match 5-0, it will be the weaker side approximately 36% of the
time.

So 5-0 seems to be much worse than 25 out of 32, or 107 out of 200.

With 13/18 you are wrong only 25% of the time with an Elo delta of 25, which is
better, but still not as good as the others.

>and stop when it is clear that no result out of these results is possible(for
>example if the result is 28-24).
>
>The results(5-0,7-1...) are based on the program who is better with 95%
>confidence.

There is a 5% chance that a program about 100 points worse could be the winner
of a 5-0 match, assuming that this result is reached.  With that in mind, I
don't see how you can say there is a 95% chance that the one is stronger than
the other one.  The percentage of a bogus result increases if the difference
between the programs is less.

>The practical confidence is smaller and I do not know of a good way to calculate
>it except simulation.
>
>The probability to get 5-0 for one program is 1/32 and it means that the
>probability to get 5-0 result between equal programs is 2/32 because both
>programs can win.

You'll get the result very rarely.  The question is, if get it, what does it
tell you?

If they are 1/10000 of an Elo point apart, the odds that one of them will get
5-0 are about as good as the other one.  You can't claim that you can tell the
stronger one with 95% accuracy when you can barely tell a 2500 player from a
2400 player with 95% accuracy based upon a 5-0 result.

bruce

>The probability to get 7-1 between equal programs is also less than 1/10 but the
>probability to get one of the results 5-0 or 7-1 is bigger and I did not
>caclulate it.
>Calculating the probability to get one of the results 5-0,7-1... is a problem
>that I do not know of a way to solve it except simulation.
>
>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.