Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Dummy Cadaques Tournament (Long)

Author: Christophe Theron

Date: 01:07:44 01/28/00

Go up one level in this thread


On January 27, 2000 at 22:55:51, Michael Neish wrote:

>
>Hi,
>
>Before I get flamed, by "dummy" I mean fake.  I'm not calling anyone stupid. :)
>
>Anyone could do this in a few minutes.  I ran a Cadaques-style tournament
>between seven fictitious computer programs, i.e., seven programs play each other
>over matches consisting of 20 games each, 420 games in total for the whole
>tournament.
>
>I made the following assumptions:
>
>1)  The computers are all of equal strength.
>
>2)  The probability of a win, draw or loss are one-third each.
>
>These assumptions are for simplicity's sake only.  If anyone can suggest better
>win/draw/loss probabilities please let me know, although for the sake of this
>post I don't think they make much difference.


(big snip)



That's funny. Some days ago I had a talk with Marcus Kästner (ChessBits) and
explained to him that many many games are needed in order to determine which,
amongst 2 programs, is the best.

Eventually I wrote this simple QBasic program in order to make my arguments more
spectacular.

The program makes the same assumptions as you about percentage of losses, draws
and wins games.

It simulates a number of repeated matches between 2 programs OF ABSOLUTELY EQUAL
STRENGTH. It asks you for the number of matches (say, tournaments), number of
games in each match, and can even compute the probabiliy that the error margin
is above a certain level (the error margin can be easily computed as the
expected outcome should always be 50%).

It is even possible with this program to estimate how many games must be played
before you can tell that one program is better than its opponent by a given elo
margin.

If you are interested, just play with this program. It is a QBasic program. you
need do copy the OLDMSDOS directory of your Windows 95/98 CD to the
C:\WINDOWS\COMMAND directory in order to be able to run it.

To run it:
* open a DOS box
* go into the directory where you have copied the following RNDMATCH.BAS file
* type
    QBASIC /RUN RNDMATCH

When the program is over, type F5 to run it again.


I have just run it. My sample is 1000 matches. Each match is made of 200 games.
My program tells me that with 200 games I can only be sure that one program is
stronger if the elo difference of the two is above 35 elo points, and this is
sure with a 93.5% confidence.

If the programs are closer than 35 elo points, 200 games are not enough to be
sure which is best.

Number of matches: 1000
Number of games in each match: 200
Compute probability of error greater than: 5



    Christophe



Quick Basic listing of RNDMATCH.BAS:

(I don't pretend to hold Copyright on it, feel free to spread this everywhere)

======= Start to copy from the line below ======

CLS
PRINT "*** Simulation of chess matches ***"
PRINT
PRINT "We assume that opponents are exactly of equal strength."
PRINT "So each has exactly 50% chances to win."
PRINT

RANDOMIZE TIMER

INPUT "Number of matches to play "; nbmatch
INPUT "Number of games in each match "; nbgames
INPUT "Compute probability of error greater than "; limit

totdiff = 0
maxdiff = 0
nbmax = 0
nboverlimit = 0

FOR match = 1 TO nbmatch

  total = 0       ' total is the score of player A

  FOR game = 1 TO nbgames

    result = INT(RND * 300) ' result between 0 and 299 included

    IF result < 100 THEN
      ' DRAW
      'PRINT "1/2 - 1/2"
      total = total + .5
    ELSE
      IF result < 200 THEN
        ' PLAYER A LOSES: total stays unchanged
        'PRINT " 0  -  1"
      ELSE
        ' PLAYER A WINS
        'PRINT " 1  -  0"
        total = total + 1
      END IF
    END IF

  NEXT game

  percent = total / nbgames * 100
  diff = ABS(percent - 50)

  PRINT match;
  LOCATE CSRLIN, 1
  'PRINT "Outcome of the match: "; total; "-"; nbgames - total; "  (";
  'PRINT USING "###.##"; percent;
  'PRINT "%, error=";
  'PRINT USING "##.##"; diff;
  'PRINT " )"

  IF diff > maxdiff THEN
    maxdiff = diff
    nbmax = 0
  END IF
  IF diff = maxdiff THEN nbmax = nbmax + 1
  totdiff = totdiff + diff
  IF diff > limit THEN nboverlimit = nboverlimit + 1

NEXT match

PRINT "      "
PRINT "Maximum error: ";
PRINT USING "##.##"; maxdiff;
PRINT "   (elo diff = ";
PRINT USING "###"; maxdiff * 7;
PRINT ")   occured in ";
PRINT USING "###.###"; nbmax / nbmatch;
PRINT "% of the matches"
PRINT "Average error: ";
PRINT USING "##.##"; totdiff / nbmatch;
PRINT "   (elo diff = ";
PRINT USING "###"; totdiff / nbmatch * 7;
PRINT ")"
PRINT "Prob( error > ";
PRINT USING "##.##"; limit;
PRINT "% ) = ";
PRINT "Prob( elo diff > ";
PRINT USING "###"; limit * 7;
PRINT " ) = ";
PRINT USING "##.##"; nboverlimit / nbmatch * 100;
PRINT "%"

======= Stop copying here ======




This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.