Author: Heinz van Kempen
Date: 01:58:46 02/04/06
Go up one level in this thread
On February 03, 2006 at 19:26:43, Joseph Ciarrochi wrote: >Here is the stats table i promised Heinz and others who might be interested. > > > > > > >Table: Percentage Scores needed to conclude one engine is likely to be better >than the other in head to head competetion > > Cut-off (alpha) >Number of games 5% 1% .1% >10 75 85 95 >20 67.5 75 80 >30 63.3 70 73.3 >40 62.5 66.3 71.3 >50 61 65 68 >75 58.6 61.3 66 >100 57 60 63 >150 55.7 58.3 60 >200 54.8 57 59.8 >300 54.2 55.8 57.5 >500 53.1 54.3 55.3 >1000 52.2 53.1 54.1 > >Notes: >• Based on 10000 randomly chosen samples. Thus, these values are approximate, >though with such a large sample, the values should be close to the “true” value. >• Alpha represents the percentage of time that the score occurred by chance. >(i.e., occurred, even though we know the true value to be .50, or 50%). Alpha is >basically the odds of incorrectly saying two engines differ in head to head >competition. >• Traditionally, .05 alpha is used as a cut-off, but I think this is a bit too >lenient. I would recommend 1% or .1%, to be reasonably confident >• Draw rate assumed to be .32 (based on CEGT 40/40 draw rates). Variations in >draw rate will slightly effect cut-off levels, but i don't think the difference >will be big. >• Engines assumed to play equal numbers of games as white and black >• In cases where a particular score fell both above and below the cutoff, then >the next score above the cutoff was chosen. This leads to conservative >estimates. (e.g., for n of 10, a score of 7 occurred above and below the 5% >cutoff. Therefore , 7.5 became the cut-off) >• Type 1 error = saying an engine is better in head to head competition, when >there is actually no difference. The chance of making a type 1 error increases >with the number of comparisons you make. If you conduct C comparisons, the odds >of making at least one type 1 error = 1 – (1-alpha)^C. (^ = raised to the power >of C). >• It is critical that you choose your sample size ahead of time, and do not >make any conclusions until you have run the full tournament. It is incorrect, >statistically, to watch the running of the tournament, wait until an engine >reaches a cut-off, and then stop the tournament. Hi Joseph, thanks for your work and your interesting table. We will put it on CEGT website und ratings and comments. Keep up the good work Heinz
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.