Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Small number statistics and small differences

Author: Bruce Moreland

Date: 17:04:11 08/14/98

Go up one level in this thread



On August 14, 1998 at 14:23:28, Peter Fendrich wrote:

>I don't understand how you guys get these results. In my world probability is
>completely different from confidence. If the probability is odds, chance or
>whatever, the confidence is a meassurement of how sure we can be of the
>probability value itself. The information given by a 200-game match gives far
>more confident probabilities than a 4-game match. In the 4-game match it's not
>even applicable to use the term confidence. It's as worthless to compute like
>meassuring 1/100 of a seconds with you wrist-watch.

In my world I don't know what I am talking about.  I am trying to get through
this dense statistics book, but it is taking a while.

My reason for posting on this topic is that people seem to think that they can
do an N-game match, with some suitably comforting value of N, and take the
results as signficant, which in this case means, I guess, truthful, regardless
of the score.

I suspect that in matches that are fairly close, which most of them will be (I
think), that you will end up having, for lack of a better term (yet), a range of
not incredibly unlikely error which exceeds the Elo delta that can be computed
from the score of the match.

I think that most close matches are likely to produce an inconclusive result,
rather than a hard-fought and exhausting match where "the best program won".

I think the amount by which you might be mistaken would decrease if you ran more
trials, but the score of a match between two approximately equal programs would
tend to tighten up, as well.  You might be better able to determine that "A and
B aren't too much different", but it still could be a stretch to say "A is
better than B".

I don't have any problem with the matches themselves, only with the conclusions.

A 4-0 blowout *should* be a rare thing, and even though the error margin is
large, it is still a massive blowout.  It might be interesting to find out how
often it happens between roughly equal programs, it should happen just a few
percent of the time, depending upon draw percentage (less draws means it should
happen more often).

I would love to hear from anyone who is competent in this area, who could tell
me with authority where I am messing up.

I freely admit I might be wrong, and I've heard from several people who think
that 4-0 is pretty common and means nothing, but I really would like to figure
out *why*, since this should be rare between equal programs.

bruce



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.