Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How many games are needed to find out which program is stronger?

Author: Bruce Moreland

Date: 16:43:40 09/04/99

Go up one level in this thread


On September 04, 1999 at 16:05:24, James T. Walker wrote:

>I think I understand what you are saying but I also think your last couple of
>paragraphs are contrary to stastical probability.  I believe that stastics show
>that the more data you have the more probable that you conclusion is correct.

No, that's not true, and I'll give you an example.  If you do a 1000-game match
and A beats B by a score of 502-498, you have a lot of data, but the statement
"A is stronger than B" is a very strong statement, given this evidence, and you
can't make it, the data doesn't support it, even though it is more likely that
the statement is true than that it isn't.

If you play 100 games, and A beats B by a score of 100-0, the statement "A is
stronger than B", is a very weak statement, it is a conservative conclusion.
The odds that A really is stronger than B are almost 100% in this case, whereas
in the previous case the odds that A is stronger than B are only slightly more
than 50%.

If you get some result from some finite-length match, you can't declare that
your match provides an exact picture of the strength difference between two
programs, but you can make a weaker assertion about the two programs, and
calculate a percentage chance that the assertion is correct.

If you see two guys walking down the street, and one is obviously much larger
than the other, you can assert that the one guy weighs more than the other guy,
and be extremely confident that you are right, without using a scale.

If you see two guys that seem to be about the same size, you can't assert as
confidently that the one is bigger than the other, you would need more data (a
scale), and if the scale shows a marked difference, you can be confident of this
result as well.

But if these last two guys weigh almost EXACTLY the same amount, beyond the
ability of the scale to differentiate, you may have less confidence that the one
weighs more than the other, than you do using your own eyeball to distinguish
between the two other guys, who are obviously of different size.

My point is that you can be confident of making a true statement as long as you
don't make your statement stronger than the data that supports it.  And if the
data the supports your statement is extremely strong, you can make a stronger
statement than you can if you have a whole bunch of data that is inconclusive.

bruce



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.