Author: J. Wesley Cleveland
Date: 10:21:05 03/03/99
Go up one level in this thread
On March 03, 1999 at 10:06:33, Shaun Brewer wrote: >I have been experimenting with openings and therefore played many games >attempting to determine if a certain book is better or not. As my PC is needed >for other tasks I have to interrupt the games and start again I then amalgamate >the results of several batches of games in an attempt to get something >statistically relevant. > >Here are the example scores for one such set of batches, all played on the same >machine using the same program with books a and b constant for all batches. > >a b >26 - 35 > 9.5 - 6.5 > 7 - 15 >58.5 - 54.5 >39.5 - 45.5 > >I am rapidly coming to the conclusion that hundreds of games would be required >to be able to state that a is better than b, and this would also apply to >program v program tests. > >It would also be very easy to stop a series of games at a point that backs a >particular argument. > >What level of confidence can be attached to computer tournaments that the winner >is the best? > >Is it true that computer v computer results vary more than human v human >results? There is a simple statistical test called the student's t test, that will give the probability of one program being superior. Unfortunately, I don't remember it well enough to give the formula. Your data, however, does not like it is statistically significant.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.