Author: Kurt Utzinger
Date: 01:04:52 03/09/02
Go up one level in this thread
On March 09, 2002 at 01:01:02, TEERAPONG TOVIRAT wrote:
>Hi,
>
>How many games I've to test between 2 versions. I think it varies as
>the score ratio. Suppose program A beats program B 4-1.Can I say
>A is superior to B? Or the number is too small?
>
>Thanks for any comment,
>Teerapong
As you can see from the example given below, even a match result over 40 games
of 25-15 does not mean much. To say something concrete you need at least 100
games, better 200-300 games.
<pre>Individual statistics:
(1) A : 40 (+ 18,= 14,- 8), 62.5 %
B : 40 (+ 18,= 14,- 8), 62.5 %
(2) B : 40 (+ 8,= 14,- 18), 37.5 %
A : 40 (+ 8,= 14,- 18), 37.5 %</pre>
<pre>
Program Score % Av.Op. Elo + -
Draws
1 A : 25.0/ 40 62.5 2356 2444 98 89
35.0 %
2 B : 15.0/ 40 37.5 2444 2356 89 98
35.0 %</pre>
Generated by Elostat v1.1 from Frank Schubert:
[Start ELO = 2400]
This file contains the most important and central result of the calculations,
the rating list, arranged by Elo performances:
The separate columns give the (mean) Elo performance, the + and - margins of
error given with 95 % confidence, the number of finished games, the relative
score given in percentages, the average opponent Elo and finally, the relative
number of draws for each program.
[From the readme.rtf of Frank Schubert's Elostat v1.1]
Kind regards
Kurt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.