Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Program comparison

Author: J. Wesley Cleveland

Date: 10:21:05 03/03/99

Go up one level in this thread


On March 03, 1999 at 10:06:33, Shaun Brewer wrote:

>I have been experimenting with openings and therefore played many games
>attempting to determine if a certain book is better or not. As my PC is needed
>for other tasks I have to interrupt the games and start again I then amalgamate
>the results of several batches of games in an attempt to get something
>statistically relevant.
>
>Here are the example scores for one such set of batches, all played on the same
>machine using the same program with books a and b constant for all batches.
>
>a      b
>26   -  35
> 9.5 -  6.5
> 7   - 15
>58.5 - 54.5
>39.5 - 45.5
>
>I am rapidly coming to the conclusion that hundreds of games would be required
>to be able to state that a is better than b, and this would also apply to
>program v program tests.
>
>It would also be very easy to stop a series of games at a point that backs a
>particular argument.
>
>What level of confidence can be attached to computer tournaments that the winner
>is the best?
>
>Is it true that computer v computer results vary more than human v human
>results?

There is a simple statistical test called the student's t test, that will give
the probability of one program being superior. Unfortunately, I don't remember
it well enough to give the formula. Your data, however, does not like it is
statistically significant.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.