Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSDF Fritz 6 K6-2 - Shredder 2 P200MMX game 7-11/40 Now: 9,5 - 1,5

Author: Bruce Moreland

Date: 14:49:20 02/23/00

Go up one level in this thread


On February 23, 2000 at 15:01:14, Bertil Eklund wrote:

>On February 23, 2000 at 12:33:50, Bruce Moreland wrote:
>
>>On February 23, 2000 at 11:08:43, blass uri wrote:
>>
>>>shredder2 was not tested on the fast hardware because the ssdf always use fast
>>>hardware for new programs and old hardware for old programs.
>>
>>Has anyone considered that this might be a major source of error, perhaps rating
>>inflation?
>
>Why? Any suggestions of what to do instead. Do you think humans should refuse to
>play opponents rated 200 elo higher or lower.

There is a very major assumption buried in the Swedish list, the assumption that
these ratings have some correlation with ratings on the human list.

The very best way to make the ratings correlate with the human list would be to
have the programs play against a variety of humans.

Instead the games are played exclusively between machines.

If you speed up a program's hardware, the program will become stronger against
other computers, you have ample evidence of this.  But it is not a foregone
conclusion that the program will become the same amount stronger against humans.

The programs don't differ that much from each other, and it is possible that
when you increase hardware, you allow the faster player to superset the slower
one.  It sees the same stuff, just better and faster.  What is the result of
this?  I don't know, but it is possible that it is more extreme than should be
expected.

Against humans, it could still be missing the same stuff that it missed before,
if it has a problem with long-term strategic issues.  The humans score against
the programs differently.  They have different strengths and weaknesses, and
hardware increases may not increase the computer's strengths significantly or
reduce their weaknesses.

>>"Always" is a bad word to use when you are trying to get an accurate result from
>>a system that is designed to produce accurate ratings within a pool where
>>everyone plays everyone under the same average conditions.
>
>Have you ever thought about that the human pool works in the same way, except
>for being much bigger?

Humans play against other humans, and there is little interest in correlating
the human ratings with computers, whereas there is much interest in doing the
reverse.  People want to take the ratings on the SSDF list and compare them with
human ratings.  Since the two pools have been distinct for a long time, it is
possible that they have evolved apart.  If you don't measure this, since the
numbers seem to "look" right, nothing is being proven in this respect.

A mathematical process that is used to measure and predict must have some basis
other than that it seems to do OK so far based upon our own feelings about the
issue.

>There is a slight inflation because some older programs have no
>learning-function.

The newer ones have private access to the older ones, and not vice versa, but
this is another issue.

>Of course the pool should be calibrated but not because of someones gut-feelings
>or wild guesses.

If someone wants to publish a scientifically valid list, they should expect
questions from outsiders about the methods used, and react to them a little
better than this, I think.  Perhaps I am wrong, but I have pointed out what I
think is a potential problem with your list.  You can ignore me if you wish, I
suffer not at all if you do so.

bruce



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.