Author: Heinz van Kempen
Date: 06:25:17 02/04/06
Go up one level in this thread
On February 04, 2006 at 05:53:37, Joseph Ciarrochi wrote: >Sounds good hienz. > >I think Vasik raises some good points and suggests that i should add a couple of >more notes to make sure people don't misuse the table. > > > > >In addition to the other notes, I would add the following: > > >** The values in the Table assume that you are testing a directional hypothesis, >e.g., that engine A does better than B. If you have no idea of which engine >might be better, then your hypothesis is non-directional and you must double the >alpha rate. This means that if you select the .05 criteria, and you have a >non-directional hypothesis, you are in fact using a .1 criteria, and if you >choose the .01 criteria, you are using the .02 criteria. I recommend using at >least the .01 criteria in these instances, and preferabbly using the .1 >criterio. > > > >** Even if you get a significant result, the result may not generalize well to >future tests. One important question is: to what extent are the openings you >used in your test representative of the openings the engine would actually use >when playing. I think there is no way you can get a representative sample of >opening positions with only, say, ten openings. You probably need at least 50 >different openings. If you are going to use a particular opening book with an >engine, it would be ideal to sample a fair number of different openings from >this opening book. > > > > >On February 04, 2006 at 04:58:46, Heinz van Kempen wrote: > >>On February 03, 2006 at 19:26:43, Joseph Ciarrochi wrote: >> >>>Here is the stats table i promised Heinz and others who might be interested. >>> >>> >>> >>> >>> >>> >>>Table: Percentage Scores needed to conclude one engine is likely to be better >>>than the other in head to head competetion >>> >>> Cut-off (alpha) >>>Number of games 5% 1% .1% >>>10 75 85 95 >>>20 67.5 75 80 >>>30 63.3 70 73.3 >>>40 62.5 66.3 71.3 >>>50 61 65 68 >>>75 58.6 61.3 66 >>>100 57 60 63 >>>150 55.7 58.3 60 >>>200 54.8 57 59.8 >>>300 54.2 55.8 57.5 >>>500 53.1 54.3 55.3 >>>1000 52.2 53.1 54.1 >>> >>>Notes: >>>• Based on 10000 randomly chosen samples. Thus, these values are approximate, >>>though with such a large sample, the values should be close to the “true” value. >>>• Alpha represents the percentage of time that the score occurred by chance. >>>(i.e., occurred, even though we know the true value to be .50, or 50%). Alpha is >>>basically the odds of incorrectly saying two engines differ in head to head >>>competition. >>>• Traditionally, .05 alpha is used as a cut-off, but I think this is a bit too >>>lenient. I would recommend 1% or .1%, to be reasonably confident >>>• Draw rate assumed to be .32 (based on CEGT 40/40 draw rates). Variations in >>>draw rate will slightly effect cut-off levels, but i don't think the difference >>>will be big. >>>• Engines assumed to play equal numbers of games as white and black >>>• In cases where a particular score fell both above and below the cutoff, then >>>the next score above the cutoff was chosen. This leads to conservative >>>estimates. (e.g., for n of 10, a score of 7 occurred above and below the 5% >>>cutoff. Therefore , 7.5 became the cut-off) >>>• Type 1 error = saying an engine is better in head to head competition, when >>>there is actually no difference. The chance of making a type 1 error increases >>>with the number of comparisons you make. If you conduct C comparisons, the odds >>>of making at least one type 1 error = 1 – (1-alpha)^C. (^ = raised to the power >>>of C). >>>• It is critical that you choose your sample size ahead of time, and do not >>>make any conclusions until you have run the full tournament. It is incorrect, >>>statistically, to watch the running of the tournament, wait until an engine >>>reaches a cut-off, and then stop the tournament. >> >>Hi Joseph, >> >>thanks for your work and your interesting table. We will put it on CEGT website >>und ratings and comments. >> >>Keep up the good work >>Heinz Hi Joseph, here is the link. Please keep us informed about any changes or additions. http://www.husvankempen.de/nunn/rating/tablejoseph.htm Best Regards Heinz
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.