Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Ok, Ok can we all agree that comps. are as strong if not better than FM.

Author: Walter Koroljow

Date: 14:29:49 01/18/01

Go up one level in this thread


On January 18, 2001 at 07:44:13, Dann Corbit wrote:

>On January 18, 2001 at 07:31:56, Walter Koroljow wrote:
>[snip]
>>By the way, I do not understand what a composite computer player is.
>
>Your rating is based upon repeated observations.  This is required to form an
>ELO number.  But the conditions of each measurement are different.  Different
>programs or versions of programs.  Different machines.  The experiment has no
>control and the data points are not valid for the conditions of ELO calculations
>except as guesses and extrapolation.
>
>In order to calculate ELO, a magical machine opponent that does not exist has
>been formed.  It has fought Anand and gone to Dortmund and all sorts of exciting
>things.  However, this composite beast is a figment of the imagination because
>it does not exist.
>
>All that having been said, if the machine were held constant and the program
>held constant so that we actually had something stable to measure, I doubt if
>the outcome would change any.

Hmmm. I think you have a direct calculation in mind as is done by USCF.  In that
case, your observations make perfect sense.  But my approach was an indirect
proof, in which I assume something, show it leads to a bad prediction and reject
the assumption.  What is eventually left is the truth.  I don't need the
conditions you mention.  I need other conditions, e.g., statistical independence
of game results.

Maybe I botched up my explanation, or there was too much mind-numbing detail.
I'll take a shot at trying to explain and see if that gets us anywhere.  Here is
the essence of what I did.  The major parts are here without the details.

To simplify calculations, assume there is no such thing as a draw.  Suppose we
have a player rated 2200.   John plays 80 games with Heinz and 80 games with
Senior.  Heinz and Senior are computers.  The computers score 103 out of 160.

We ask, "Could the computers both be rated 2250?"  So we assume the computers
are rated 2250 and simulate the 160 games a million times and keep track of the
total scores we get.  We find out that we get 102 or more less than 5% of the
time.  So we conclude that the computers' rating cannot be 2250.  That would
lead with high probability to a prediction in conflict with reality (a score of
103).  Note that to come to this conclusion we do not need to know the playing
conditions, the machines need not be identical, etc.  All we need are assumed
ratings, a true score, and some statistical assumptions (independence) allowing
a calculation of probabilities of results.

We ask, "Could the average rating of the computers be 2250 with a spread of 50?"
 That is, could one computer be 2200 and the other 2300.  So we assume these
ratings and simulate the 160 games another million times.  We get 102 or more
less than 5% of the time.  So we conclude that the computers' average rating
cannot be 2250 with a spread of 50.  We try different spreads...
.
.
.
We ask, "Could the average rating of the computers be 2250 with a spread of
300?"  That is, could one computer be 2100 and the other 2400.  So we assume
these ratings and simulate the 160 games another million times.  This time we
find that we get 100 or more less than 5% of the time.  So we conclude that  the
computers' average rating cannot be 2250 with a spread of 300.

But we know that the computers are of about equal strength, so we have looked at
more than all the reasonable spreads in their ratings.  We know that their
average rating cannot be 2250 with any reasonable spread.  So we conclude that
their average rating cannot be 2250.

We now try an average rating of 2260 with different spreads. If that doesn't
work, we keep trying different average ratings until we come to the first one we
cannot reject for some spread.  This could be the truth, or the truth could be
even higher (we don't know the true spread after all).  This is the lower bound
to the 95% confidence interval.  The calculation is over.

This is the basic logic of the calculation.  I cannot see how a composite
computer, etc. are involved.  All I see is a probability distribution of results
for two computers with assumed ratings.

Does this make sense to you?

Anyway, I am going to have supper and go to the chess club, where I haven't been
in ages.  Have a good evening.

Walter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.