Computer Chess Club Archives


Search

Terms

Messages

Subject: Match results - a complete(!) theory (long)

Author: Peter Fendrich

Date: 13:14:34 02/18/01



In another message Bruce Moreland raised the question
about how to interpret a match result.
With a match result between two players, he wanted to
express things like:
Player1 is better than Player2 with a certain significance
level.

The following algorithms will try to give a general method
to deal with and evaluate match results. The Bruce example
will show up in the end of this message.

This is not found in the standard statistic course and I
haven't seen any description of this so I wanted to give it
a try because it should be rather straight forward. I have
to admit however that it gave me some headache before the
obvious solution appeared.

My intention is to create a program based on these ideas,
maybe adjusted after your comments, and place that program
here or on any other well known site so we don't have to
struggle all this over and over again whenever some new
results drop in. :)

If you don't want to read all this "crap" you can instead
get my little program and test it. In the end of this message
there is some more info.
Just send me an email and I'll return the source code and an -exe file.

Be patient because the program isn't written yet but very soon!!!

Some basic definitions and formulas
===================================
The real problem here is when we have small number of games.
When we have a lot of games he normal distribution could be used.
We will use the Trinomial distribution which is the only one
giving the exact probabilities when the outcome has three distinct
outcomes.

N  = number of games
W  = number of wins         (for Player1)
D  = number of draws
L  = number of losses       (for Player1)
Pw = probability of a win   (for Player1)
Pd = probability of a draw
Pl = probability of a loss  (for Player1)
hence N = W+D+L and Pw+Pd+Pl = 1.0

P    = probability of a certain match result with exactly W, L and D
cumP = probability of a certain match result or better (for Player1)
       This function is often used but we will avoid it here.
With the help of the Trinomial distribution we can get the P if we
know the values of both Pw and Pd:
P(N,W,D,Pw,Pd)  = (N! / W!D!L!) Pw^W * Pd^D * Pl^L  where L=N-W-D

One problem, of course, is that we don't know the real values of
Pw, Pd and Pl.

The Question to answer
======================
Given a match result W, D and L between Player1 and Player2.

H0: Player1 better than Player2 or in other words
    Pw+Pd/2 > 0.5
P(H0) is the probability that H0 is true. (I don't care about rejecting
H0 because it's not necessary)

Let's start with the assumption that we don't have any prior knowledge
of the distribution of Pw and Pd. That means that they are evenly
distributed within the area formed by Pw and Pd each as an axis between
0 to 1 and where Pw+Pd <= 1.

As said before the probability that a certain game result will occur
given that Pw and Pd is computed from the Trinomial function.

Now let's look at the analytical solution.

First compute the integral of that Trinomial function (with W,D and
L fixed) over both Pw and Pd. Pw is in interval 0 to 1 and Pd is in
the interval 0 to 1-Pw for each Pw. We now have a double integral that
is the complete space of all probabilities of all possible combinations
of Pw and Pd for a specific match result.
Let's call this result Pcomplete.

Secondly let's take the same integral for all Pw+Pd/2 > 0.5. It means
that Pw is in the interval 0 to 1 and Pd from max(0, 1-2Pw) to (1-Pw)
for each Pw. This integral is the space of all probabilities of all
possible combinations of Pw and Pd when Player1 is the best one
(Pw+Pd/2 > 0.5).
Let's call this result Pbetter.

The final conclusion
====================
The probability that Player1 is better than Player2 is Pbetter/Pcomplete.

A numeric method
================
Maybe the theory above is hard to grasp but the following is a computer
simulation that should give the same result in the long run.

1. We have a result, let's say 7-3 with W=6, D=2 and L=2.
   Declare 2 counters. One representing A better than B and the
   other B is better than or equal to A.
2. Pick a pair of Pw, Pd randomly so that 0 <= Pw+Pd <= 1.
3. Compute P = Trinomial(10, 6, 2, Pw, Pd)
4. Play a match with 10 games where each game has a random outcome
   weighted by P.
5. If (the match result is 7-3 with W=6, D=2 and L=2)
       If (Pw+Pd/2 > 0.5) increment counter1
       else increment counter2
   else do nothing
6. Loop 2-5 MANY MANY times.
7. Compute counter1/(counter1+counter2)

This will get us the probability for the match result with no prior
knowledge of Pw and Pd. If we know anything at all about Pw or Pd we
could let the random picking under 2 reflect this.
In order to make this code a bit more effective we could change 4 and 5
to
4. If (Pw+Pd/2 > 0.5) Add P to counter1 (must now be a double)
   else Add P to counter2 (also a double)
5. No operation
We will get the same result but faster.
In order to reflect any prior knowledge we could use weights in the
integrals above to capture that knowledge.
In fact the last algorithm is equal to how the integral above would
be computed with the Monte Carlo method. The alg given here is a perfect
simulation of the integral that was showed to be the solution.

My little program
=================
If we could agree of the above theory then it is best documented
in a program.
Given a match result the pgm will produce:
  - What is the probability to get such a result depending. on the
    strength difference between Player1 and 2
  - What is P(H0) meaning the probability that Player1 is better than Player2.

When it works I would like to extend the pgm with some other functions.
Send me an email and I will return the program to anyone interested.

A few other comments
====================
I don't rely on figures like this, only!
The statistics used here is based on the match outcome only. There is
a lot of other information that can't covered by the bare match result
numbers. By looking at the individual games we will get a lot of other
information that is hard to include in statistic methods. Patterns of
all kinds, like Player2 is broken and always loses in 4 moves or whatever,
might be more valuable than the result 10-0 only.

Furthermore I'm not that found of this A better than B hypothesis. I am
more found of other methods like confidence intervals. Maybe we could
include that as well in the program later on.

For matches counting many games, like 100 or more, we could use the Normal
distribution instead of the described number crunching but exact method.
It should give about the same results. The Normal distribution offers a
variety of useful functions and is the basis for the ELO system.

When dealing with match results and not knowing the number of
draws we could set D to either some common sense value of the number
of draws or the worst case which is as few draws as possible that still
follows the match result.
The result 60-40 with W=60, D=0, L=40 is more "spread out" and shows
more variance, meaning less accurate, than with W=20, D=80, L=0. See the next
topic.

The "Bruce example"
===================
We have two results 10-0 and 60-40 respectively. Which one is the most
'valid'? Let's translate 'is the most valid' with 'has the highest
probability that A is better than B'. We will see that it dependes of the
number of draws in the 60-40 match.
The following figures are computed by my program and are subject to errors
that I haven't tried to estimate yet. At least 4 decimals should be ok.

1) 10 games between A and B. Result 10-0

Let's try some different values of Pw and Pd

10 games. 10-0-0, Result: 10-0
A few combinations of Pw and Pd:
P(Win)  P(Draw) P(res)  	cumP
------  ------- -----------     ---------
0.33    0.33    0.000016935     0.0000169
0.40    0.20    0.000104858     0.0001049
0.50    0.00    0.000966841     0.0009668
0.37    0.26    0.000048086     0.0000481
0.33    0.35    0.000013147     0.0000131
0.01    0.99    0.000000000     0.0000000
0.80    0.10    0.107374182     0.1073742

Probability for A better than B is: 0.9995087

2) 100 games between A and B. Result: 60-40
   (Nothing is said about number of draws. Let's try two cases)

 a) No draws in the match
    100 games. 60-40-0, Result: 60-40
    A few combinations of Pw and Pd:
    P(Win)  P(Draw) P(res)  	    cumP
    ------  ------- --------------- ----------------
    0.33    0.33    2.66726982e-020 0.008335829
    0.40    0.20    2.20893466e-012 0.01447611
    0.50    0.00    0.00981144545   0.02784797
    0.37    0.26    9.08565295e-016 0.01155849
    0.33    0.35    2.12089826e-021 0.007664598
    0.01    0.99    1.08438667e-202 2.350298e-026
    0.80    0.10    2.10660425e-018 1.000000000

    Probability for A better than B is: 0.9759706

 b) Maximum number of draws (80)
    100 games. 20-0-80, Result: 60-40
    A few combinations of Pw and Pd:
    P(Win)  P(Draw) P(res)          cumP
    ------  ------- --------------- -----------------
    0.33    0.33    1.03987806e-027 0.008335829
    0.40    0.20    7.12444101e-044 0.01447611
    0.50    0.00    5.01027042e-226 0.02784797
    0.37    0.26    1.95458083e-035 0.01155849
    0.33    0.35    3.10643366e-026 0.007664598
    0.01    0.99    2.2875309e-026  2.350298e-026
    0.80    0.10    6.17946754e-062 1.000000000

    Probability for A better than B is: 0.9999995

...and in the end
=================
Thanks for reading all this. I hope there are no typos or logical
errors, but no warranties at all are given!

The word "complete" in the subject was meant to provoke a little.
There are a lot of other approaches of course.
I once wrote an article in the "Swedish ICCA" (now SSDF) magazine
PLY, about this topic. That article was completely based on the
normal distribution and later became the basis for how SSDF computes
the ELO's and the corresponding confidence levels.
That was a long time ago and modern methods are more or less based on
computer power.
This time I had good help of my son David who opened my eyes for
computer simulations as a method to find the analytical conclusion.
Maybe more practical and with some interesting possibilities.
...and fascinating!

Regards
//Peter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.