Computer Chess Club Archives


Search

Terms

Messages

Subject: a few qualifications to add in the notes at the end of the table

Author: Joseph Ciarrochi

Date: 02:53:37 02/04/06

Go up one level in this thread


Sounds good hienz.

I think Vasik raises some good points and suggests that i should add a couple of
more notes to make sure people don't misuse the table.




In addition to the other notes, I would add the following:


** The values in the Table assume that you are testing a directional hypothesis,
e.g., that engine A does better than B. If you have no idea of which engine
might be better, then your hypothesis is non-directional and you must double the
alpha rate. This means that if you select the .05 criteria,  and you have a
non-directional hypothesis, you are in fact using a .1 criteria, and if you
choose the .01 criteria, you are using the .02 criteria. I recommend using at
least the .01 criteria in these instances, and preferabbly using the .1
criterio.



** Even if you get a significant result, the result may not generalize well to
future tests. One important question is: to what extent are the openings you
used in your test representative of the openings the engine would actually use
when playing.  I think there is no way you can get a representative sample of
opening positions with only, say, ten openings. You probably need at least 50
different openings. If you are going to use a particular opening book with an
engine,  it would be ideal to sample a fair number of different openings from
this opening book.




On February 04, 2006 at 04:58:46, Heinz van Kempen wrote:

>On February 03, 2006 at 19:26:43, Joseph Ciarrochi wrote:
>
>>Here is the stats table i promised Heinz and others who might be interested.
>>
>>
>>
>>
>>
>>
>>Table: Percentage Scores needed to conclude one engine is likely to be better
>>than the other in head to head competetion
>>
>>		  Cut-off (alpha)
>>Number of games	5%	1%	.1%
>>10	        75	85	95
>>20	        67.5	75	80
>>30	        63.3	70	73.3
>>40	        62.5	66.3	71.3
>>50	        61	65	68
>>75	        58.6	61.3	66
>>100	        57	60	63
>>150	        55.7	58.3	60
>>200	        54.8	57	59.8
>>300	        54.2	55.8	57.5
>>500	        53.1	54.3	55.3
>>1000	        52.2	53.1	54.1
>>
>>Notes:
>>•	Based on 10000 randomly chosen samples. Thus, these values are approximate,
>>though with such a large sample, the values should be close to the “true” value.
>>•	Alpha represents the percentage of time that the score occurred by chance.
>>(i.e., occurred, even though we know the true value to be .50, or 50%). Alpha is
>>basically the odds of incorrectly saying two engines differ in head to head
>>competition.
>>•	Traditionally, .05 alpha is used as a cut-off, but I think this is a bit too
>>lenient. I would recommend  1% or .1%, to be reasonably confident
>>•	Draw rate assumed to be .32 (based on CEGT 40/40 draw rates). Variations in
>>draw rate will slightly effect cut-off levels, but i don't think the difference
>>will be big.
>>•	Engines assumed to play equal numbers of games as white and black
>>•	In cases where a particular score fell both above and below the cutoff, then
>>the next score above the cutoff  was chosen. This leads to conservative
>>estimates. (e.g., for n of 10, a score of 7 occurred above and below the 5%
>>cutoff. Therefore , 7.5 became the cut-off)
>>•	Type 1 error = saying an engine is better in head to head competition, when
>>there is actually no difference. The chance of making a type 1 error increases
>>with the number of comparisons you make.  If you conduct C comparisons, the odds
>>of making at least one type 1 error = 1 – (1-alpha)^C. (^ = raised to the power
>>of C).
>>•	 It is critical that you choose your sample size ahead of time, and do not
>>make any conclusions until you have run the full tournament. It is incorrect,
>>statistically, to watch the running of the tournament,  wait until an engine
>>reaches a cut-off, and then stop the tournament.
>
>Hi Joseph,
>
>thanks for your work and your interesting table. We will put it on CEGT website
>und ratings and comments.
>
>Keep up the good work
>Heinz



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.