Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: revised statistics table;Detecting differences in head to head competiti

Author: chandler yergin

Date: 12:35:56 02/05/06

Go up one level in this thread


On February 05, 2006 at 00:03:38, Joseph Ciarrochi wrote:

>in light of the interest in the tables, i decided to redo all analyis,
>increasing the number of samples from 10000 to 50000 and therby increasing the
>precision of the estimates. I have inserted the table and notes at the bottom of
>this email (there are only tiny differences between the tables based on  10000
>versus 50000 samples).
>
>
>The table is based on a draw rate of .32, which is what top engines and humans
>tend to get in slower games. however, the draw rates between more average humans
>playing blitz, and with some engines playing blitz, tends to be about .12 (based
>on Internet chess club statistics).  I recalculated some values using this lower
>draw rate and obtained the following
>
>
>		Cut-off (alpha)
>Number of games	5%	1%	.1%
>10	       80	90	100
>50	      62	66	71
>100	      58	61.5	65
>
>
>Comparing this to the table below, the critical values tend to be higher when
>you have a lower draw rate, especially for smaller numbers of games. Lower draw
>rate means greater variability in scores and therefore a greater occurance of
>extreme scores.
>
>
>
>
>
>
>________________________redone tables___________________________________
>
>
>Percentage Scores needed to conclude one engine is likely to be better than the
>other
>
>		Cut-off (alpha)
>Number of games	5%	1%	.1%
>10	         75	85	95
>20	        67.5	72.5	80
>30	        63.3	68.3	75
>40	        62.5	66.3	71.3
>50	        60	65	69
>75	       58.6	61.3	65.3
>100	       57.5	60	63.5
>150	       56	58	60.7
>200	        55	57	59.3
>300	        54	55.7	57.5
>500	        53.1	54.4	55.8
>1000	        52.2	53.1	54.1
>
>
>
>•  Based on 50000 randomly chosen samples. Thus, these values are approximate,
>though with such a large sample, the values should be close to the “true” value.
>•  Alpha represents the percentage of time that the score occurred by chance.
>(i.e., occurred, even though we know the true value to be .50, or 50%). Alpha is
>basically the odds of incorrectly saying two engines differ in head to head
>competition.
>•  Traditionally, .05 alpha is used as a cut-off, but I think this is a bit too
>lenient. I would recommend 1% or .1%, to be reasonably confident
>•  Draw rate assumed to be .32 (based on CEGT 40/40 draw rates). Variations in
>draw rate will slightly effect cut-off levels, but i don't think the difference
>will be big.
>•  Engines assumed to play equal numbers of games as white and black
>•  In cases where a particular score fell both above and below the cutoff, then
>the next score above the cutoff was chosen. This leads to conservative
>estimates. (e.g., for n of 10, a score of 7 occurred above and below the 5%
>cutoff. Therefore , 7.5 became the cut-off)
>•  Type 1 error = saying an engine is better in head to head competition, when
>there is actually no difference. The chance of making a type 1 error increases
>with the number of comparisons you make. If you conduct C comparisons, the odds
>of making at least one type 1 error = 1 – (1-alpha)^C. (^ = raised to the power
>of C).
>•  It is critical that you choose your sample size ahead of time, and do not
>make any conclusions until you have run the full tournament. It is incorrect,
>statistically, to watch the running of the tournament, wait until an engine
>reaches a cut-off, and then stop the tournament.
>•  The values in the Table assume that you are testing a directional hypothesis,
>e.g., that engine A does better than B. If you have no idea of which engine
>might be better, then your hypothesis is non-directional and you must double the
>alpha rate. This means that if you select the .05 criteria, and you have a
>non-directional hypothesis, you are in fact using a .1 criteria, and if you
>choose the .01 criteria, you are using the .02 criteria. I recommend using at
>least the .01 criteria in these instances, and preferabbly using the .1
>criterio.
>•  Even if you get a significant result, the result may not generalize well to
>future tests. One important question is: to what extent are the openings you
>used in your test representative of the openings the engine would actually use
>when playing. I think there is no way you can get a representative sample of
>opening positions with only, say, ten openings. You probably need at least 50
>different openings. If you are going to use a particular opening book with an
>engine, it would be ideal to sample a fair number of different openings from
>this opening book.

I would like your perception/conclusion based on the evidfence of the Statistics
noted.
Thank You,
cy
"Computer Engine Grande Tournaments"
Last Posting 2005 showed a Total of 68319 games
had been played. How did it come out?
White Wins 1-0 25715 (37.6%)
Black Wins 0-1 19580 (28.7%)
Draws 1/2 1/2 23024 (33.7%)
White Perf. = 54.5%
Black Perf. = 45.5%
What are the Stats by ECO Code?
Here you go-
http://web.telia.com/~u85924109/ssdf/
http://home.interact.se/~w100107/openings.htm
OPENINGS
Based on 18 947 games I've looked at some statistics.
The score for white is 55-45 and an average game is 70 moves.
I've divided the games in most played openings:
For the Comps; games by ECO Code:
ECO A = 9068 (13.3%)
ECO B = 17619 (25.8%)
ECO C = 10987 (16.1%)
ECO D = 14939 (21.9%)
ECO E = 12718 (10.6%)
Perf. White = 54.5%
Perf. Black = 45.5%
In my Mega DATABASE 99 Total Games Played = 1114332
Results:
White Wins 1-0 413653 = (37%)
Black Wins 0-1 318394 = (28%)
Draws 1/2 1/2 381453 = (35%)
White Perf. = 54%
Black Perf. = 46%
Here are the Stats from Big Database 2003
Total Games = 2311756
1-0 = 883180 = 38%
0-1 = 700303 = 30%
1/2 1/2 = 726673 = 32%
White perf. = 54%
Black perf. = 46%
Taken at random
TWIC 578 1612 Games
1-0 624 games total 55%
0-1 478 games total 45%
TWIC 580 710 games
1-0 243 games total 55%
0-1 179 games total 45%
TWIC 582 1115 games
1-0 434 games total 55%
0-1 330 games total 45%
TWIC 583 games
1-0 744 games total 55%
0-1 561 games total 45%
TWIC 674 1245 games
1-0 475 games total 55%
0-1 350 games total 45%
Let's see how some Great Players of the Past did.
Marshall, F Total Games played = 1364
1-0 = 523 total 55%
0-1 = 392 total 45%
1/2 = 448
Alekhine,A Total Games played = 1606
1-0 = 670 total 56%
0-1 = 476 total 44%
1/2 1/2 = 459
Capablanca, J Total Games played = 570
1-0 = 185 total 54%
0-1 = 136 total 46%
1/2 1/2 = 248
Lasker, E Total Games played = 1063
1-0 = 424 total 53%
0-1 = 367 total 47%
1/2 1/2 = 269
Tal, M Total Games played = 2868
1-0 = 922 total 56%
0-1 = 602 total 44%
1/2 1/2 = 1343
Fischer, R Total Games played = 2167
1-0 = 850 total 55%
0-1 = 637 total 45%
1/2 1/2 = 677
Spassky, B Total Games played = 2113
1-0 = 531 total 55%
0-1 = 537 total 45%
1/2 1/2 = 1245
Smyslov, V Total Games played = 2579
1-0 = 741 total 55%
0-1 = 459 total 45%
1/2 1/2 = 1379
White: Perf. 55% with the following ECO Classifications
B12
C99
D30
C99
B48
C06
B90
E58
A30
B50
B70
E21
B22
C45
C24
B04
C99
D47
E02
B33
Black: Perf. 45% with the following ECO Classifications
D24
B96
C91
D46
D30
E97
D31
E15
B81
A09
A12
C01
C34
D58
C82
A22
B92
B42
B61
E15
B92
1/2 = 1/2 by ECO Classifications
D12
D43
C91
D27
E11
B08
From Mega Database 99 Total games played = 1114332
1-0 = 413653 = 37% White Perf. = 54%
0-1 = 318394 = 28% Black Perf. = 46%
1/2 1/2 = 381453 = 35%

ECO A Total Games Played = 251258
1-0 = 92153 = 36% White Perf. = 53%
0-1 = 74735 = 29% Black Perf. = 47%
1/2 1/2 = 84027 = 35%
ECO B Total games played = 332554
1-0 = 121731 = 36% White Perf. = 53%
0-1 = 103961 = 31% Black Perf. = 47%
1/2 1/2 =106654 = 33%
ECO C Total games played = 194836
1-0 = 75063 = 38% White Perf. = 55%
0-1 = 54853 = 28% Black Perf. = 45%
1/2 1/2 = 64809 = 34%
ECO D Total Games Played = 173791
1-0 = 64067 = 36% White Perf. = 56%
0-1 = 42410 = 24% Black Perf. = 44%
1/2 1/2 = 67220 = 40%
ECO E Total games played = 161736
1-0 = 60515 = 37% White Perf. = 56%
0-1 = 42389 = 26% Black Perf. = 44%
1/2 1/2 = 58721 = 37%
From BIg Database 2003.cbh Total Games played = 2311786
1-0 = 883180 = 38% White Perf. = 54%
0-1 = 700303 = 30% Black Perf. = 30%
1/2 1/2 =726672 = 32%
ECO A Total games played = 251258
1-0 = 92193 = 36% White Perf. = 53%
0-1 =74735 = 29% Black Perf. = 29%
1/2 1/2 = 84027 = 35%
ECO B Total games played = 332554
1-0 = 121731 = 36% White Perf. = 53%
0-1 = 103961 = 31% Black Perf. = 31%
1/2 1/2 = 106654 = 33%
ECO C Total games played = 194836
1-0 = 75063 = 38% White Perf. = 55%
0-1 = 54853 = 28% Black Perf. = 45%
1/2 1/2 = 64809 = 34%
ECO D Total games played = 173791
1-0 = 64067 = 36% White Per. = 56%
0-1 = 42410 = 24% Black Perf. = 44%
1/2 1/2 = 67220 = 40 %
ECO E Total games played = 161736
1-0 = 60515 = 37% Whte Perf. = 56%
0-1 = 42839 = 26% Black Perf. = 44%
1/2 1/2 =58721 = 37%

CEGT 40/40
Downloads and Statistics
December 18, 2005
Total number of games: 63'069
White wins: 23'743 (37.6%)
Black wins: 18'090 (28.7%)
Draws: 21'236 (33.7%)
White score: 54.5%
OPENINGS
Based on 18 947 games I've looked at some statistics. The score for white is
55-45 and an average game is 70 moves.
I've divided the games in most played openings:
CEGT Blitz 40/4 only best version
Games : 53806 (finished)
White Wins : 21272 (39.5 %)
Black Wins : 17130 (31.8 %)
Draws : 15404 (28.6 %)
Unfinished : 20
White Perf. : 53.8 %
Black Perf. : 46.2 %
ECO A = 6448 Games (12.0 %)
ECO B = 13184 Games (24.5 %)
ECO C = 6772 Games (12.6 %)
ECO D = 11286 Games (21.0 %)
ECO E = 12093 Games (22.5 %)
erzeugt mit ELOStat 1.1c32
CEGT Blitz 40/4 all versions
Games : 53806 (finished)
White Wins : 21272 (39.5 %)
Black Wins : 17130 (31.8 %)
Draws : 15404 (28.6 %)
Unfinished : 20
White Perf. : 53.8 %
Black Perf. : 46.2 %
ECO A = 6448 Games (12.0 %)
ECO B = 13184 Games (24.5 %)
ECO C = 6772 Games (12.6 %)
ECO D = 11286 Games (21.0 %)
ECO E = 12093 Games (22.5 %)
erzeugt mit ELOStat 1.1c32
CEGT Blitz 40/4 CM settings
Games : 53806 (finished)
White Wins : 21272 (39.5 %)
Black Wins : 17130 (31.8 %)
Draws : 15404 (28.6 %)
Unfinished : 20
White Perf. : 53.8 %
Black Perf. : 46.2 %
ECO A = 6448 Games (12.0 %)
ECO B = 13184 Games (24.5 %)
ECO C = 6772 Games (12.6 %)
ECO D = 11286 Games (21.0 %)
ECO E = 12093 Games (22.5 %)
CEGT 40/40 repeated best version >= 275 games
Games : 75359 (finished)
White Wins : 28395 (37.7 %)
Black Wins : 21649 (28.7 %)
Draws : 25315 (33.6 %)
Unfinished : 0
White Perf. : 54.5 %
Black Perf. : 45.5 %
ECO A = 10673 Games (14.2 %)
ECO B = 19452 Games (25.8 %)
ECO C = 12373 Games (16.4 %)
ECO D = 16284 Games (21.6 %)
ECO E = 13503 Games (17.9 %)
CEGT 40/40 repeated > 10 games
Games : 75359 (finished)
White Wins : 28395 (37.7 %)
Black Wins : 21649 (28.7 %)
Draws : 25315 (33.6 %)
Unfinished : 0
White Perf. : 54.5 %
Black Perf. : 45.5 %
ECO A = 10673 Games (14.2 %)
ECO B = 19452 Games (25.8 %)
ECO C = 12373 Games (16.4 %)
ECO D = 16284 Games (21.6 %)
ECO E = 13503 Games (17.9 %)





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.