Computer Chess Club Archives

Search

Terms

Messages

Subject: table for detecting significant difference between two engines

Author: Joseph Ciarrochi

Date: 16:26:43 02/03/06

Here is the stats table i promised Heinz and others who might be interested.






Table: Percentage Scores needed to conclude one engine is likely to be better
than the other in head to head competetion

		  Cut-off (alpha)
Number of games	5%	1%	.1%
10	        75	85	95
20	        67.5	75	80
30	        63.3	70	73.3
40	        62.5	66.3	71.3
50	        61	65	68
75	        58.6	61.3	66
100	        57	60	63
150	        55.7	58.3	60
200	        54.8	57	59.8
300	        54.2	55.8	57.5
500	        53.1	54.3	55.3
1000	        52.2	53.1	54.1

Notes:
•	Based on 10000 randomly chosen samples. Thus, these values are approximate,
though with such a large sample, the values should be close to the “true” value.
•	Alpha represents the percentage of time that the score occurred by chance.
(i.e., occurred, even though we know the true value to be .50, or 50%). Alpha is
basically the odds of incorrectly saying two engines differ in head to head
competition.
•	Traditionally, .05 alpha is used as a cut-off, but I think this is a bit too
lenient. I would recommend  1% or .1%, to be reasonably confident
•	Draw rate assumed to be .32 (based on CEGT 40/40 draw rates). Variations in
draw rate will slightly effect cut-off levels, but i don't think the difference
will be big.
•	Engines assumed to play equal numbers of games as white and black
•	In cases where a particular score fell both above and below the cutoff, then
the next score above the cutoff  was chosen. This leads to conservative
estimates. (e.g., for n of 10, a score of 7 occurred above and below the 5%
cutoff. Therefore , 7.5 became the cut-off)
•	Type 1 error = saying an engine is better in head to head competition, when
there is actually no difference. The chance of making a type 1 error increases
with the number of comparisons you make.  If you conduct C comparisons, the odds
of making at least one type 1 error = 1 – (1-alpha)^C. (^ = raised to the power
of C).
•	 It is critical that you choose your sample size ahead of time, and do not
make any conclusions until you have run the full tournament. It is incorrect,
statistically, to watch the running of the tournament,  wait until an engine
reaches a cut-off, and then stop the tournament.

Re: table for detecting significant difference between two engines Heinz van Kempen 01:58:46 02/04/06
- a few qualifications to add in the notes at the end of the table Joseph Ciarrochi 02:53:37 02/04/06
  - Re: a few qualifications to add in the notes at the end of the table Heinz van Kempen 06:25:17 02/04/06
Re: table for detecting significant difference between two engines Vasik Rajlich 01:13:17 02/04/06
- Re: table for detecting significant difference between two engines Joseph Ciarrochi 02:13:20 02/04/06
  - Re: table for detecting significant difference between two engines Vasik Rajlich 03:30:37 02/05/06
    - Re: table for detecting significant difference between two engines Uri Blass 17:01:02 02/05/06
      - Re: table for detecting significant difference between two engines Vasik Rajlich 08:41:05 02/06/06
      - Re: table for detecting significant difference between two engines Joseph Ciarrochi 04:39:23 02/06/06
        
        a gambling metaphor: How would you bet? Joseph Ciarrochi 04:51:24 02/06/06
        
        Re: a gambling metaphor: How would you bet? Vasik Rajlich 08:34:49 02/06/06
Re: table for detecting significant difference between two engines Kirill Kryukov 18:09:11 02/03/06
- Re: table for detecting significant difference between two engines Joseph Ciarrochi 18:11:48 02/03/06
  - Re: table for detecting significant difference between two engines Kirill Kryukov 18:37:09 02/03/06
    - using bayselo versus elostat Joseph Ciarrochi 19:53:56 02/03/06
      - Re: using bayselo versus elostat Kirill Kryukov 21:12:17 02/03/06
        
        Re: using bayselo versus elostat; error estimates differ Joseph Ciarrochi 21:43:53 02/03/06
        
        Re: using bayselo versus elostat; error estimates differ Kirill Kryukov 21:53:59 02/03/06
        
        Re: using bayselo versus elostat; error estimates differ Joseph Ciarrochi 21:59:35 02/03/06
        
        Re: using bayselo versus elostat; error estimates differ Kirill Kryukov 22:26:08 02/03/06
      - Re: using bayselo versus elostat Michael Yee 20:05:43 02/03/06
      - Re: using bayselo versus elostat Dann Corbit 20:05:42 02/03/06
    - Re: table for detecting significant difference between two engines Dann Corbit 18:43:26 02/03/06
Re: table for detecting significant difference between two engines Alessandro Scotti 17:10:23 02/03/06
- yes, by all means, use the table wherever you think it is useful nt Joseph Ciarrochi 18:12:59 02/03/06
  - Re: yes, by all means, use the table wherever you think it is useful nt Alessandro Scotti 15:51:05 02/04/06
    - Re: yes, by all means, use the table wherever you think it is useful nt Joseph Ciarrochi 19:29:51 02/04/06

This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.