Author: Maurizio De Leo
Date: 09:21:20 08/30/05
Go up one level in this thread
>Under valid and controlled conditions it still seems logical to me to stop a >test after a 5-0 result and conclude that the winning program is probably the >stronger one. >>I don't put much credence in any result of less than 30 games. >>After 30 games, then you get a lot more plausibility. >You didn't give any reason for this, so I don't understand. A 6-0 says more >about engine strength than the above match result with over 100000 games. Dann is right, I think. The confidence interval calculation assumes that the score of a game is a statistic variable with a mean value between 1 and -1 (function of the Elo difference between the programs) and a standard deviation. Then if the experiments are independent, the sum of the points will approximate the product (mean*number of games) with a smaller standard deviation the more the games are. With enough games the "confidence" will get to 95% when the performance difference between the two programs is more than 3 standard deviations. However this assumes a normal distribution. The assumption can be made for any repeated statistical variable as long as the experiments are independent and "enough". This "enough" is indeed expressed in most statistics books as 30. Maurizio
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.