Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Small number statistics and small differences

Author: Dan Homan

Date: 08:45:24 08/14/98

Go up one level in this thread


On August 14, 1998 at 06:08:36, Dan Homan wrote:

>
>The problem with small number statistics is that they can be very
>mis-leading.  A 4-0 result in a 4 game match between nearly equal
>programs (with 20% draw chances) happens about 1/40 th of the time.
>A 3.5-0.5 (or better) result happens about 1/13 th of a time.
>
>If program A beats program B by a score of 4-0, this means that A has
>a 97% (roughly) chance of being stronger than A.  So it seems like a
>pretty good bet that A is better than B, but consider the following
>scenario.

The above paragraph is incorrect, because I didn't consider the
chance that the weaker program could also go 4-0.  The conclusion
that A is stronger than B from a 4-0 result is considerable worse
than 97% accurate.  For nearly equal programs (that differ in
only a few percent winning chances) the conclusion is correct more
like 60-70% of the time.

>
>Say that you use this 4-game match technique to test new versions of
>your program versus older versions.  Whenever you make a change you
>run one of these matches and decide to keep the change only if you get
>a 4-0 result.  Because you have a very well developed program, most
>changes will have almost no effect on playing strength.  Even changes
>that do increase the playing strength slightly will not affect the
>1/40 odds of getting a 4-0 result very much.  So, you will get a
>4-0 result 1/40 th of the time - regardless of whether the change
>you make is good or bad.  So using these 4-game matches to decide
>on playing strength increases will cause you to randomly select
>which versions to keep and which to discard.
>
>So a 97% confidence isn't that helpful after all - at least not for
>what we chess programmer do.  The problem is that we are trying to
>descriminate small differences in playing strength and 4-game match
>just can't do that with any reliability.
>
> - Dan



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.