Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: a few qualifications to add in the notes at the end of the table

Author: Heinz van Kempen

Date: 06:25:17 02/04/06

Go up one level in this thread


On February 04, 2006 at 05:53:37, Joseph Ciarrochi wrote:

>Sounds good hienz.
>
>I think Vasik raises some good points and suggests that i should add a couple of
>more notes to make sure people don't misuse the table.
>
>
>
>
>In addition to the other notes, I would add the following:
>
>
>** The values in the Table assume that you are testing a directional hypothesis,
>e.g., that engine A does better than B. If you have no idea of which engine
>might be better, then your hypothesis is non-directional and you must double the
>alpha rate. This means that if you select the .05 criteria,  and you have a
>non-directional hypothesis, you are in fact using a .1 criteria, and if you
>choose the .01 criteria, you are using the .02 criteria. I recommend using at
>least the .01 criteria in these instances, and preferabbly using the .1
>criterio.
>
>
>
>** Even if you get a significant result, the result may not generalize well to
>future tests. One important question is: to what extent are the openings you
>used in your test representative of the openings the engine would actually use
>when playing.  I think there is no way you can get a representative sample of
>opening positions with only, say, ten openings. You probably need at least 50
>different openings. If you are going to use a particular opening book with an
>engine,  it would be ideal to sample a fair number of different openings from
>this opening book.
>
>
>
>
>On February 04, 2006 at 04:58:46, Heinz van Kempen wrote:
>
>>On February 03, 2006 at 19:26:43, Joseph Ciarrochi wrote:
>>
>>>Here is the stats table i promised Heinz and others who might be interested.
>>>
>>>
>>>
>>>
>>>
>>>
>>>Table: Percentage Scores needed to conclude one engine is likely to be better
>>>than the other in head to head competetion
>>>
>>>		  Cut-off (alpha)
>>>Number of games	5%	1%	.1%
>>>10	        75	85	95
>>>20	        67.5	75	80
>>>30	        63.3	70	73.3
>>>40	        62.5	66.3	71.3
>>>50	        61	65	68
>>>75	        58.6	61.3	66
>>>100	        57	60	63
>>>150	        55.7	58.3	60
>>>200	        54.8	57	59.8
>>>300	        54.2	55.8	57.5
>>>500	        53.1	54.3	55.3
>>>1000	        52.2	53.1	54.1
>>>
>>>Notes:
>>>•	Based on 10000 randomly chosen samples. Thus, these values are approximate,
>>>though with such a large sample, the values should be close to the “true” value.
>>>•	Alpha represents the percentage of time that the score occurred by chance.
>>>(i.e., occurred, even though we know the true value to be .50, or 50%). Alpha is
>>>basically the odds of incorrectly saying two engines differ in head to head
>>>competition.
>>>•	Traditionally, .05 alpha is used as a cut-off, but I think this is a bit too
>>>lenient. I would recommend  1% or .1%, to be reasonably confident
>>>•	Draw rate assumed to be .32 (based on CEGT 40/40 draw rates). Variations in
>>>draw rate will slightly effect cut-off levels, but i don't think the difference
>>>will be big.
>>>•	Engines assumed to play equal numbers of games as white and black
>>>•	In cases where a particular score fell both above and below the cutoff, then
>>>the next score above the cutoff  was chosen. This leads to conservative
>>>estimates. (e.g., for n of 10, a score of 7 occurred above and below the 5%
>>>cutoff. Therefore , 7.5 became the cut-off)
>>>•	Type 1 error = saying an engine is better in head to head competition, when
>>>there is actually no difference. The chance of making a type 1 error increases
>>>with the number of comparisons you make.  If you conduct C comparisons, the odds
>>>of making at least one type 1 error = 1 – (1-alpha)^C. (^ = raised to the power
>>>of C).
>>>•	 It is critical that you choose your sample size ahead of time, and do not
>>>make any conclusions until you have run the full tournament. It is incorrect,
>>>statistically, to watch the running of the tournament,  wait until an engine
>>>reaches a cut-off, and then stop the tournament.
>>
>>Hi Joseph,
>>
>>thanks for your work and your interesting table. We will put it on CEGT website
>>und ratings and comments.
>>
>>Keep up the good work
>>Heinz

Hi Joseph,

here is the link. Please keep us informed about any changes or additions.

http://www.husvankempen.de/nunn/rating/tablejoseph.htm

Best Regards
Heinz



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.