Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Statistics and Test results

Author: Chris Welty

Date: 00:53:39 10/08/04

Go up one level in this thread


Previous message was accidentally sent while I was typing it, and it's wrong so
please ignore...

>I did some more research on this--

>We are assuming here that an engine's win percentage follows a binomial
>distribution with probability p. meaning that, out of "n" games played (with no
>draws), we can expect the engine to win n*p games on average.

Right.

>3) the interval is then given approximately by:
>
>phat +/- z(a/2) sqrt (  phat * (1-phat) / n )
>

This is usually a pretty good approximation. It's even better if you use it in
reverse and calculate the confidence interval for phat from the known
null-hypothesis value; in this case the the null-hypothesis is that the engines
are equal strength (i.e. win percentage 0.5) and so the bounds of the confidence
interval are

phat = 0.5 +/- z(a/2) sqrt(0.5*0.5/n)

or

abs((2*phat-1) sqrt(n)) = z(a/t)

In the language of my original post, (where phat=W/(W+L), n=R, z(a/t)=T)

T=abs(S/sqrt(R))

Which is the formula from my original post.



In answer to your other points,
1. You drop draw scores-- you should be either rolling them into wins or losses
or using a multinomial distribution model.

Draws are certainly relevant if I'm trying to decide HOW MUCH better one engine
is than another. They're not relevant to the question "does A beat B more than B
beats A" which is why I restricted my whole note to this case.

2. You did not state that which distribution you believed the number of wins -
number of losses to follow.

If the sample is random, and independent, and identically distributed (iid) then
the distribution has to be binomial.

2a. Your sample is not random, although if this continues to be a picking point,
I will drop it as the rest of my claims are valid and can't be refuted.

There's no statistical test to prove a sample is iid; there are tests to prove
it's not. In my testing I've not found any nonrandomness but since the whole
method hinges on this I'd be very interested in any evidence that it's not.

3. Your transformation of variables and your test on the statistic "t" is not
valid-- you might be assuming a normal model, but I can't know that since you
didn't say so.  Nevertheless, even if you did, I can show you that your
transformations do not lead to a distribution that can be safetly approximated
by the normal distribution.

The statistics in your followup post approximated the binomial distribution with
a normal one, and the formula you gave is virtually identical to my original
one. I'll assume I've convinced you on this, please let me know if I haven't.

The normal distribution is a quite accurate approximation to the binomial when
phat is close to 0.5.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.