Author: Christophe Theron
Date: 01:07:44 01/28/00
Go up one level in this thread
On January 27, 2000 at 22:55:51, Michael Neish wrote:
>
>Hi,
>
>Before I get flamed, by "dummy" I mean fake. I'm not calling anyone stupid. :)
>
>Anyone could do this in a few minutes. I ran a Cadaques-style tournament
>between seven fictitious computer programs, i.e., seven programs play each other
>over matches consisting of 20 games each, 420 games in total for the whole
>tournament.
>
>I made the following assumptions:
>
>1) The computers are all of equal strength.
>
>2) The probability of a win, draw or loss are one-third each.
>
>These assumptions are for simplicity's sake only. If anyone can suggest better
>win/draw/loss probabilities please let me know, although for the sake of this
>post I don't think they make much difference.
(big snip)
That's funny. Some days ago I had a talk with Marcus Kästner (ChessBits) and
explained to him that many many games are needed in order to determine which,
amongst 2 programs, is the best.
Eventually I wrote this simple QBasic program in order to make my arguments more
spectacular.
The program makes the same assumptions as you about percentage of losses, draws
and wins games.
It simulates a number of repeated matches between 2 programs OF ABSOLUTELY EQUAL
STRENGTH. It asks you for the number of matches (say, tournaments), number of
games in each match, and can even compute the probabiliy that the error margin
is above a certain level (the error margin can be easily computed as the
expected outcome should always be 50%).
It is even possible with this program to estimate how many games must be played
before you can tell that one program is better than its opponent by a given elo
margin.
If you are interested, just play with this program. It is a QBasic program. you
need do copy the OLDMSDOS directory of your Windows 95/98 CD to the
C:\WINDOWS\COMMAND directory in order to be able to run it.
To run it:
* open a DOS box
* go into the directory where you have copied the following RNDMATCH.BAS file
* type
QBASIC /RUN RNDMATCH
When the program is over, type F5 to run it again.
I have just run it. My sample is 1000 matches. Each match is made of 200 games.
My program tells me that with 200 games I can only be sure that one program is
stronger if the elo difference of the two is above 35 elo points, and this is
sure with a 93.5% confidence.
If the programs are closer than 35 elo points, 200 games are not enough to be
sure which is best.
Number of matches: 1000
Number of games in each match: 200
Compute probability of error greater than: 5
Christophe
Quick Basic listing of RNDMATCH.BAS:
(I don't pretend to hold Copyright on it, feel free to spread this everywhere)
======= Start to copy from the line below ======
CLS
PRINT "*** Simulation of chess matches ***"
PRINT
PRINT "We assume that opponents are exactly of equal strength."
PRINT "So each has exactly 50% chances to win."
PRINT
RANDOMIZE TIMER
INPUT "Number of matches to play "; nbmatch
INPUT "Number of games in each match "; nbgames
INPUT "Compute probability of error greater than "; limit
totdiff = 0
maxdiff = 0
nbmax = 0
nboverlimit = 0
FOR match = 1 TO nbmatch
total = 0 ' total is the score of player A
FOR game = 1 TO nbgames
result = INT(RND * 300) ' result between 0 and 299 included
IF result < 100 THEN
' DRAW
'PRINT "1/2 - 1/2"
total = total + .5
ELSE
IF result < 200 THEN
' PLAYER A LOSES: total stays unchanged
'PRINT " 0 - 1"
ELSE
' PLAYER A WINS
'PRINT " 1 - 0"
total = total + 1
END IF
END IF
NEXT game
percent = total / nbgames * 100
diff = ABS(percent - 50)
PRINT match;
LOCATE CSRLIN, 1
'PRINT "Outcome of the match: "; total; "-"; nbgames - total; " (";
'PRINT USING "###.##"; percent;
'PRINT "%, error=";
'PRINT USING "##.##"; diff;
'PRINT " )"
IF diff > maxdiff THEN
maxdiff = diff
nbmax = 0
END IF
IF diff = maxdiff THEN nbmax = nbmax + 1
totdiff = totdiff + diff
IF diff > limit THEN nboverlimit = nboverlimit + 1
NEXT match
PRINT " "
PRINT "Maximum error: ";
PRINT USING "##.##"; maxdiff;
PRINT " (elo diff = ";
PRINT USING "###"; maxdiff * 7;
PRINT ") occured in ";
PRINT USING "###.###"; nbmax / nbmatch;
PRINT "% of the matches"
PRINT "Average error: ";
PRINT USING "##.##"; totdiff / nbmatch;
PRINT " (elo diff = ";
PRINT USING "###"; totdiff / nbmatch * 7;
PRINT ")"
PRINT "Prob( error > ";
PRINT USING "##.##"; limit;
PRINT "% ) = ";
PRINT "Prob( elo diff > ";
PRINT USING "###"; limit * 7;
PRINT " ) = ";
PRINT USING "##.##"; nboverlimit / nbmatch * 100;
PRINT "%"
======= Stop copying here ======
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.