Subject: New tool to estimate the statistical significance of match results

Author: Rémi Coulom

Date: 07:07:36 07/17/04


I have made a new tool to estimate the likelihood that one program is better
than another, based on game results against the same opponents. You can download
it here:

I have no evidence, but I expect its results to be more satisfactory than any
result based on Elo theory. This tool was very useful to me during the
preparation of the WCCC, to test differences between versions of TCB. I hope
some of you will find it useful too.

Here is a sample output of MonteCarlo.exe:

  This program evaluates the likelihood that program A is better than
program B, based on the result of two matches played against the same
opponent (or set of opponents). The number of games played in each of
these matches does not have to be the same. If playing against a set
of opponents, the proportion of each opponent should be the same in each
  The likelihood is estimated by Bayesian inference, assuming an uniform
prior distribution of the probabilites of losing and winning.
  The resulting integral is estimated with a Monte-Carlo method. It may
take a long time to converge when the number of games is large (>100).
The computation can be interrupted at any time with Ctrl-C.

A wins   = 3
A losses = 4
A draws  = 5

B wins   = 6
B losses = 7
B draws  = 8

P(A>B) = 0.459296 (127000000 Iterations)


