Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How many games are needed to find out which program is stronger?

Author: James T. Walker

Date: 18:05:06 09/04/99

Go up one level in this thread


On September 04, 1999 at 19:43:40, Bruce Moreland wrote:

>On September 04, 1999 at 16:05:24, James T. Walker wrote:
>
>>I think I understand what you are saying but I also think your last couple of
>>paragraphs are contrary to stastical probability.  I believe that stastics show
>>that the more data you have the more probable that you conclusion is correct.
>
>No, that's not true, and I'll give you an example.  If you do a 1000-game match
>and A beats B by a score of 502-498, you have a lot of data, but the statement
>"A is stronger than B" is a very strong statement, given this evidence, and you
>can't make it, the data doesn't support it, even though it is more likely that
>the statement is true than that it isn't.

Well, don't put words in my mouth.  If I had a score of 502-498 I could say that
they are very even with a high degree of confidence.  I would not be stupid
enough to say that "A is stronger than B".  However if I had a score of 5.0 to
5.0 I would not be so confident that they were even.  That's what I mean by
having a higher probability that your conclusion would be correct.  By the same
reasoning if I had 600-400 I would be more confident that A is better than B if
the score were only 6-4.  So I stick to my statement.  More data gives you a
higher degree of certainty.  (Just don't mis-interpret the data)


>
>If you play 100 games, and A beats B by a score of 100-0, the statement "A is
>stronger than B", is a very weak statement, it is a conservative conclusion.
>The odds that A really is stronger than B are almost 100% in this case, whereas
>in the previous case the odds that A is stronger than B are only slightly more
>than 50%.

There would be very little argument in this case but again you are interpreting
the data incorrectly to make your point.  But "A is stronger than B" is
perfectly valid and nothing weak about it.  If you said "Maybe A is better than
B" that's a weak statement but I wouldn't say that given the above data(100-0).
Even in the face of a small probability that the next 100 might be 0-100.

>
>If you get some result from some finite-length match, you can't declare that
>your match provides an exact picture of the strength difference between two
>programs, but you can make a weaker assertion about the two programs, and
>calculate a percentage chance that the assertion is correct.
>
>If you see two guys walking down the street, and one is obviously much larger
>than the other, you can assert that the one guy weighs more than the other guy,
>and be extremely confident that you are right, without using a scale.
>
>If you see two guys that seem to be about the same size, you can't assert as
>confidently that the one is bigger than the other, you would need more data (a
>scale), and if the scale shows a marked difference, you can be confident of this
>result as well.
>
>But if these last two guys weigh almost EXACTLY the same amount, beyond the
>ability of the scale to differentiate, you may have less confidence that the one
>weighs more than the other, than you do using your own eyeball to distinguish
>between the two other guys, who are obviously of different size.
>
>My point is that you can be confident of making a true statement as long as you
>don't make your statement stronger than the data that supports it.  And if the
>data the supports your statement is extremely strong, you can make a stronger
>statement than you can if you have a whole bunch of data that is inconclusive.

   (We agree here.)

>
>bruce



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.