Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Next Human vs Computer ratings list - I need opinions

Author: Robert Hyatt

Date: 15:25:07 05/19/00

Go up one level in this thread


On May 19, 2000 at 13:29:05, Ed Schröder wrote:

>On May 19, 2000 at 13:26:26, Robert Hyatt wrote:
>
>>On May 19, 2000 at 12:41:57, Enrique Irazoqui wrote:
>>
>>>On May 19, 2000 at 12:22:05, Ed Schröder wrote:
>>>
>>>>On May 19, 2000 at 11:05:45, Enrique Irazoqui wrote:
>>>>
>>>>>On May 19, 2000 at 10:58:57, Robert Hyatt wrote:
>>>>>
>>>>>>On May 19, 2000 at 10:27:04, blass uri wrote:
>>>>>>
>>>>>>>On May 19, 2000 at 09:42:07, Enrique Irazoqui wrote:
>>>>>>>
>>>>>>>>On May 19, 2000 at 09:37:19, Chris Carson wrote:
>>>>>>>>
>>>>>>>>>I am planning to publish an updated list list here with
>>>>>>>>>all rated human vs computer results for 40/2 events.
>>>>>>>>>
>>>>>>>>>Please let me know your thoughts on the following:
>>>>>>>>>
>>>>>>>>>1.  Exclude Performance Rating when 3 or fewer games
>>>>>>>>>    have been played by a program/hardware.
>>>>>>>>
>>>>>>>>I don't see why.
>>>>>>>>
>>>>>>>>>2.  Exclude forfiets and protest resignations (Dutch Championship),
>>>>>>>>>    and games where computers lost due to hardware, IP failures,
>>>>>>>>>    or operator error.
>>>>>>>>
>>>>>>>>I would definitely exclude forfeits and IP failures, but not the rest. In my
>>>>>>>>opinion, this list is interesting if it reflects the real performance of
>>>>>>>>programs in actual games. Hardware failures and operator's errors are part of
>>>>>>>>how a program plays. Forfeits and IP failures are not.
>>>>>>>>
>>>>>>>>Enrique
>>>>>>>
>>>>>>>Do you really think that losing on time is part of how shredder4 plays?
>>>>>>>
>>>>>>>I do not agree.
>>>>>>>I think that operator's error are not part of how a program plays and it is not
>>>>>>>fair to include the game that shredder lost on time in a winning position when
>>>>>>>the reason was not a bug in the program.
>>>>>>>
>>>>>>>Uri
>>>>>>
>>>>>>
>>>>>>Depends on your definition of "How Shredder plays".  If you mean how it plays
>>>>>>in human events, then the answer is "yes".  Because the operator _will_ make a
>>>>>>mistake here and there.  Resigning when there is a deep saving move that the
>>>>>>program might have played without understanding it.  Losing time on the clock
>>>>>>by going to the bathroom.  Etc. The human operator _is_ part of the "system"
>>>>>>until we start using robots controlled by the computer.
>>>>>>
>>>>>>I have made mistakes (as an operator) that ending up costing Cray Blitz a game
>>>>>>here and there.  In the WMCCC event in Jakarta, the operator misunderstood how
>>>>>>to set the time control and set it for 40 moves in 2 days, not 40 moves in 2
>>>>>>hours.  We lost the first game that way.  If you have a human in the loop, then
>>>>>>he has to be factored in.  As does hardware failures which _do_ happen in games.
>>>>>>
>>>>>>In fact, bleeding edge hardware is dangerous to use for this reason.
>>>>>
>>>>>This was my first reaction too, but I remember reading here that the operator of
>>>>>Shredder in the last round of the Israeli league lost on time almost on purpose,
>>>>>making telephone calls, not caring about the program, etc. So it is an
>>>>>exceptional case that in my opinion makes the game irrelevant for rating
>>>>>purposes.
>>>>
>>>>I understand the point you are making. The very same thing happened in
>>>>the 2 games Rebel8 played against GM Ralf Akesson. Rebel8 won the first
>>>>game and lost the second game on time due to an operator error in a
>>>>promising position. Make an exception? No way IMO. The next thing a GM
>>>>loses on time in a won position because his wife gave birth and he went
>>>>home. The list of exceptions soon becomes endless. We need a clear rule.
>>>
>>>Sure, but if the purpose of this rating list is to give us an idea of the
>>>strength of programs, I would discard games that we know are meaningless, like
>>>the 2 forfeits of Fritz in Holland and this Shredder game. The key word, to me,
>>>is "meaning", and this game has none. The list may be more complicated, but also
>>>more accurate.
>>>
>>>Enrique
>>>
>>>>Ed
>>>>
>>>>
>>>>>Enrique
>>
>>
>>Forfeits are "non-events" since they weren't played (even the 4 mover was a
>>non-game for obvious reasons).  But operator errors are part of the computer
>>"system".  And you can't always have a "perfect" operator.  The best solution
>>is that the author is the _only_ one that operates.  As he is the least likely
>>to make an error that influences the game outcome.  But as soon as you use other
>>operators, the probability of error increases.  And as it increases, the chances
>>that the program will perform somewhat below "expectation" become greater.
>>
>>But that is part of the "system" IMHO.  Otherwise you start with "It lost that
>>game due to a power failure that lost some pondering time."  "It lost this game
>>due to a hardware glitch that made me reboot and restart, losing information."
>>"It lost that game due to an operator typo that made it have to back up and lose
>>stuff."  "It lost that game because the hardware crashed and wouldn't come back
>>up."
>>
>>Etc.
>>
>>All of those are part of computer chess.  The human has his own set of problems
>>to contend with, and he can't escape them either.
>
>Totally in agreement.
>
>Ed


Here is an example of a potential problem:

A few years ago we were playing in the Fredkin match.  I was operating CB
on a Cray in Minneapolis.  Cray's phone system went down for reasons unknown
and I couldn't connect.  I called Harry Nelson who had made arrangements to
use one of Livermore's computers as a backup if we had problems.  He restarted
the game, keyed in the first 20 moves and off we went.  After about 40 moves,
with both players pretty close on time, Harry made an error.  He tried
restarting, but he was using a 'modified CB' of his and it corrupted the restart
file.  He tried typing in all the moves, but naturally he had not been careful
in writing them down.  No Go.  Time running very low now.  He tried the usual
"setboard command" similar to what we had in Crafty.  Unfortunately he had
somehow broken it in his modified version.  While we struggled to resume, our
flag fell.

Should we have been penalized?  Of course.  That is part of the deal.  Humans
fail.  Programs fail. Hardware fails.  Communication fails.  All play a part
based on the current rules in effect.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.