Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Last SSDF Rating List

Author: Bruce Moreland

Date: 20:46:29 03/28/99

Go up one level in this thread



On March 28, 1999 at 21:20:23, Timothy J. Frohlick wrote:

>Thanks Paulo,
>
>I have noticed, as did Bruce Moreland, that CM6000 played only 100 games against
>opponents averaging 2363.  The other programs played over 500 games against
>opponents one hundred points higher.  I really don't believe that CM6000
>is as good a program as some would suggest. I have a copy and it beats me in
>about 50 to 60 moves but when I play Rebel 10c I go down in 25 to 30 moves.
>
>So, you CM6000 afficionados should get real and accept CM6000 as a good program
>that does not rise to the level of great programs like Fritz 5.32 or Rebel 10.
>My favorite program is MChessPro ver.8 because of its' database functions and
>its' fabulous opening book. It also beats me by the 30th move ie, my position is
>in shambles.
>
>Let's all be a little more scientific around here.

I noticed that the opponent rating was a lot lower and asked why.  I was told
that it had played a lot of standalone machines.

If you are a true 2400 player and you play a match against another true 2400
player, you should score exactly 50%

If you are a true 2400 player and you play a match against a true 2200 player,
you should score 75.97%

So of course what this means is that if you score 75.97% against a 2200 player,
you should be rated 2400, and likewise if you score 50% against a 2400, you
should be rated 2400.

All of this assumes impractically long matches of course, in order to remove
possibility of error.

So really it shouldn't matter what quality of opposition you face, the scoring
percerntage against them should dictate your rating.  This is an essential
element of the Elo rating system, the notion that you can compute an accurate
rating from games played with anyone else who has an accurate rating.

I am not certain that this is how computer chess really works.  The Thompson
experiment, in which he played his program searching at depth N against itself
searching to depth M, for various values of M and N, has been widely criticized.

Superficially there is no reason to criticize it.  There is a program playing
against another program, and they produce an outcome.  The Elo formula should
work.

But there is criticism and it is because the experiment being inbred, people
have said that you can't learn anything significant by playing against a
different version of yourself.

I don't know why this should be so, but if it is so, then I suggest that it
might be so for *different* computers as well.  If the experiment is bogus for
identical programs, and somehow not bogus for different programs, then perhaps
it is somewhat bogus if two programs are similar.  Maybe a lot of them are
similar, I don't know.  Maybe the effective similarity increases when you put
one of them on massively better hardware.  I don't know.

The reason I am not interested in testing this is that I am not a student, so I
am not looking for a nice research project that I can use get a degree, and I am
not an academic, so I don't really want to spend the time creating something to
publish.

But I think this may be worth studying if someone is really interested in such
arcane stuff.

What I see is a different testing procedure for this program than for any other
program -- the opponents were apparently unlike the opponents the other programs
faced -- and I propose above that there is a chance that this testing procedure
would produce an invalid result.

Essentially, the oppponents in various matches seem to be chosen via criteria
that aren't controlled.

I don't really care how strong CM6K is, personally, I have no interest in who is
on top of the Swedish list, but for those of you who do, you may wish to
reconsider becoming hysterical about these numbers.  They are numbers, and
numbers don't like, but conclusions drawn from numbers can be completely wrong.

This won't work, obviously.  Feel free to ignore me and become hysterical and
shout at each other and thump your chests about which program is supposedly best
for the next few weeks.

bruce



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.