Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Is there a statistician in the house?

Author: Dann Corbit

Date: 18:35:09 05/08/01

On May 08, 2001 at 14:05:39, Gian-Carlo Pascutto wrote:

>On May 08, 2001 at 10:43:41, Martin Schubert wrote:
>
>>How should this be possible? First you need a zero hypothesis, e.g. Fritz is as
>>good as Junior. Okay, that's not the problem. But statistics is only possible
>>when results are independent. When you're using booklearning, they're not
>>independent. So you can't calculate a degree of confidence.
>
>I can try to offer 2 solutions, but I don't know if they are good enough
>
>a) assume booklearning has no influence on the zero hypothesis, i.e.
>that the learning of Junior and Fritz is equally good. This sounds
>reasonable, but may not be correct.
>
>b) assume the booklearning is part of the zero hypothesis, so that
>the strength of a program is also determined by its book learning
>abilities.
>
>If either of these fail, I would appreciate it if you could point
>out why. This is not my area, but I'd like to learn more.

My approach would be to make assumptions and then test them.  For instance:

Fritz with book learning plays stronger than Fritz without book learning.

Play 500 games against itself -- one engine with book learning and one without.

You might be able to measure the ELO change as a function of time and make a
prediction based on the measurements.  I'm quite sure that would take more than
500 games, though.

Repeat the same experiement with Junior.

Now, you could make a hypothesis that Junior with booklearning is
stronger/weaker/the-same-as Fritz with book learning.  Run an experiment and
measure it.

If there is a bias, you should be able to measure that also.  The big problem
with all of these experiments is that it will take an enormous amount of trials
to find the answer.

Suppose one program is 500 ELO stronger than another program.  You can detect
this with only a few dozen games fairly reliably.

Suppose that it is only 50 ELO stronger ... Now it will take many thousands of
games to get a reliable answer.

Suppose that it is only 5 ELO stronger ... now you will probably never find out.
 Especially with things like book-learning etc. that alter the experiment as it
is running.

In short, with two very strong programs of about the same strength, we don't
know which one is stronger and probably can't really even find out.  If there is
book learning involved, it makes our predictions even less reliable.

So what are we left with?
A. We could name a champion.
B. We could toss a coin.
C. We could let them play a series of games and choose the winner as champion.
D. We could do something completely different.

Most of the time, we tend to opt for choice "C." -- even though it's not much
more reliable than the other methods.
;-)

of course, it may be that one is dominatingly stronger than the others.  In
which case we will probably pick the stronger program with high certainty.  Not
very many contests turn out like that.  Certainly the Fritz/Junior affair did
not.

Re: Is there a statistician in the house? Martin Schubert 00:13:56 05/09/01

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.