Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Take all program results at 40/2 vs Grandmasters you get 2500+ easily

Author: Howard Exner

Date: 09:52:36 10/03/99

Go up one level in this thread


On October 03, 1999 at 12:25:09, James B. Shearer wrote:

>On October 03, 1999 at 11:52:43, Howard Exner wrote:
>
>>On October 03, 1999 at 09:17:38, Georg v. Zimmermann wrote:
>>
>>>The game against Hoffmann should _of course_ be counted. Say what happens if I
>>>play a game with a cold in a tourniament ?
>>
>>
>>This is not quite the analogy that comes to mind. A computer that shorts out, or
>>has a power failer is more like a person having a total stroke or blackout
>>during the game - or maybe having someone bonk you over the head causing an
>>unconscious state.
>>
>>Or course if you are concerned about the game score then of course even
>>someone dying at the chess table will not matter. But in the GM challenge
>>the point is seeing how a computer program plays vs a human. Otherwise we may
>>find ourselves with a rash of posts, "I beat Crafty in 10 moves!" When asked
>>by the enquiring minds here on CCC, "How did you do that?", you could
>>simply reply, "The power went out in my house, it refused to move so it
>>lost on time! Yipee my rating just shot up!"
>
>          Rebel did not lose on time.
>          Obviously the game should count.  In any scientific experiment,
>arbitrarily throwing out data points is forbidden because it can easily
>introduce biases that destroy the validity of the results.
>Any points thrown
>out should be on the basis of a protocal establised before the experiment
>starts.

Protocol is important in gathering data. What is the protocol for the GM
challenge? What is being measured? If that being tested or measured is
the game result - ones and zeros - then the game is binding as it falls
into the protocol. If that being tested is a program running on healthy hardware
then the protocol falls short. Isn't the spirit of the challenge the
moves of the game? What is the GM trying to exploit, how is the computer
handling this strategy, is the game a lopsided crush, is the GM working hard
to press for victory. All the drama of the game seems to me the point of
the GM challenge, and a game seems void if the hardware is sputtering.

Take for example the SSDF protocol. When it comes to their attention that faulty
settings were used (book, selectivity, default) they are quick to toss the games
out as invalid. Other comp - comp testers do the same.

Where is the line drawn for the Rebel - Hoffmann game? Numerous rebooting of the
hardware ok? Complete power outage ok? The human suffering a stroke ok?

I do respect the opinions of those that want to count the game as a valid
measure of Rebel's strength vs a Grandmaster. Consensus is not important.
Once a larger number of games are played the point spread of the two
ratings (counting Hoffmann or not) will diminish anyway.


>Experience has shown humans are generally incapable of making objective
>decisions about this sort of thing.  That is why double blind experiments were
>invented.
>                           James B. Shearer



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.