Computer Chess Club Archives




Subject: Re: A question about statistics...

Author: Roger Brown

Date: 10:03:04 01/04/04

Go up one level in this thread

>When you play 100 games, the confidence level will be near 95% +/- 60 points.
>So you end up with a difference that is greater than 120 points you say with 95%
>confidence that the higher rated program is better than the lower rated program.
> So there would be a  1 out of 20 chance that your test results are not correct.
> If they are rated less than  120 points apart, statically, you can make a claim
>either way, you need more games.  The closer the rating is , the more games you
>need to achive a 95% confidence level,

So it is a hundred games that you suggest.  Two hundred should be even better

>Statistically, the 100 games does have NOT to be against the same engine,,  You
>can design a tournament with 11 engines, have them play round robin with rounds
>(10 cycles) - and you confidence level for the ratings relelative to each is
>just as valid as single 100 game match.  Alternativley, you can play 100 rounds
>with 101 engines, they each play one game with each other - statisically it is
>the same.  The key is simply 100 games.  Think about human ratings, you do not
>play 100 games against the same player, you might play 100 games against 90
>players - statistically it does not matter.

Agreed but, *ahem* I do not have a sufficient number of engines to play 100
different games.  Your point is dramatically made though.

I want to use engines near the top - for reasons of development, stability of
use, substantial debugging and testing.  In short engines like Crafty, Yace and
Little Goliath which are featured in several tournaments and can be used with a
robust, automatic and computer friendly (memory, cpu usage etc.) gui like

I want no issues with draw claims and the like....


Thanks for the feedback - and for the modified Crafties!


This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.