Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 6 game 40/2 COMP WINS just as i predicted!

Author: Dann Corbit

Date: 18:29:54 01/11/01

Go up one level in this thread


On January 11, 2001 at 20:48:44, James T. Walker wrote:
>You seem to be the one who is emotional and irrational now.  Why go off on the
>deep end on something which is really simple.  I was simply suggesting to take
>the games played by top programs in the last year or so and consider them all as
>one player.

And I responded that this has no mathematical basis.  There are many reasons
why.  Let me give one model to explain why.
Fred writes faulty chess programs.  All of them have a flaw that will be exposed
over time.  But he writes a new program each day.  If you play Fred's programs
once, you will be unlikely to find the flaw.  If we play 365 games with Fred's
programs against rated opponents, we will get a rating.  But if we play just one
of the programs against the same opponents we will get a wildly different
rating.

This model sounds silly.  But if you are a computer programmer, you know that it
actually models the true situation very well.

Now, allow me to give a reasoning point.  Some program such as Rebel or Hiarcs
has tendencies.  These tendencies could be studied and expoited.  If I play a
thousand games against one program I may learn a way to beat it.  If I play a
thousand games against a thousand programs, I am far less likely to learn a way
to beat it.

>It is perfectly logical to assume that if only one program is of GM
>strength which many people claim is not, and you add the results of other
>programs to the statistics, you are taking a worst case scenario.  This is true
>because the other programs surely are not GM strength if even 1 is not GM
>strength.  This might give you enough games combined to determine the "average"
>strength of top programs today vs humans.  Your main contention seems to be that
>there is not enough data to determine what the strength of Rebel is but you
>don't suggest how many games vs humans it would take to establish the fact one
>way or the other.

You will never prove it conclusively, but after a few hundred games you can
offer a statistical argument.  In the case of a super GM (e.g. 2600+ ELO) you
could prove with a 2/3 probability that they were of GM (2500 ELO) strength
after only one hundred games or so.  The error bar would be about 100 and hence
the odds that the center point was below 2500 would be established.

>How many games does it take for a human to establish
>himself/herself as equal to a GM in strength?

I think that there are two questions here.
1.  What are the qualifications of a GM?
This is answered by the bylaws of FIDE [or other governing body]
2.  How can we prove that someone is of GM strength?
The second is answered when we can mathematically demonstrate within an agreed
error bound that the ELO rating of a player must be at least 2500.

Note that these are two different questions with two different answers.

>What is GM strength?  Maybe you
>can come up with a number which would satisfy most people or at least yourself.
>It's kind of like fuzzy logic.

Let's use the definition of 2500 ELO against the same category of talent that is
necessary to obtain a GM norm.  The games must be at 40/2 and the games must be
under tournament conditions.  Indeed, a precise definition of what we are trying
to prove is crucial to being able to prove it.

>It becomes an easier and simpler way to arrive
>at the answer without demanding you og exactly where you want to go on the first
>try.  It's obvious that computers will never hold a GM title because has made
>this much more difficult for computers than humans.  So the only thing I know to
>do is to come up with some figures which most people agree is equal to a GM.  If
>you can't do this then you may never agree that computers are at last equal to a
>GM even when computers are beating the pants off of GMs.
>So what I was suggesting was to take the last X number of games by computers vs
>GMs and treat them as one player.

This is invalid.

> If this "Average" computer is of GM strength
>then seems to me we have some GM strength computers.

How does one quantify "it seems to me" mathematically?

>If they don't measure up
>now then we have not proven that there are no GM computers but at least we prove
>that as a whole they are not there yet.  Of course you would want to chose the
>best few computers which will give you enough games  vs humans to establish yes
>or no. (Not a C64) Say if it takes 40 or 50 games to satisfy you that computers
>have reached Gm strength then use as many of the top computer vs human games you
>need to get the 40 or 50 games.  So the bottom line is if you can't decide how
>many games it takes and what rating is equal to a GM then you will never answer
>the question.

The number of games is easily decidable, but is also a function of the
competition.  The better known the ELO of the competition, the more accurate
will the rating be for the new player to be evaluated.  If they have played
thousands of rated games, then they will be supremely useful tools for that
evaluation.  If you look at the output of ELOSTAT (for instance) you will see a
+ and a - figure for ELO value.  That represents the error bar of the
calculation for one standard deviation.  That means that there is a 2/3
probability that the actual mean lies between those two values, and a 97% chance
that it lies within a bar of double that width.

> But if you can do that then maybe you can have the answer
>already.

Knowing how to formulate the question properly does not mean that we already
have the answer, but it is a crucial first step.

>Or maybe you're not interested in the answer but just like to argue.

Passing judgement on someone's intent is always a sure sign that you have run
out of useful arguements.  I don't particularly like to argue, but if I think
that someone is wrong, then I will say that I think they are wrong and I will
tell the reasons why.

I don't see anything particularly onerous or evil in that.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.