Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Rebel Performance Rating

Author: Stephen A. Boak

Date: 15:30:02 09/19/99

Go up one level in this thread


On September 19, 1999 at 12:09:21, Howard Exner wrote:
>
>Can we really compare what Rebel is doing to how GM's attain their status?
>Given the format of these series of games makes it totally unique in the sense
>that no human GM was "measured" this way - IE: having to face opponents
>with months of prep time? It may be safer not to extrapolate too much from this
>event. I think it should just stand on its own where Rebel achieved such and
>such a rating based on the parameters outlined by the event.
>

I agree in part; but please note that in real life nothing is ever exactly the
same from test to test, yet we still draw general conclusions, more or less.

For example, I agree that generalizing or concluding too much from one specific
(rather short) set of tests or circumstances is overdoing it.  The issue is--
what is too much?  This isn't clear.

Never-the-less, results from one set of tests under a certain particular set of
conditions or circumstances are *some* measure of performance, even if not
conclusive.  The issue is how much confidence we have in the generalizations or
conclusions we draw.

Rebel Century lost to Rohde in their first match, despite a promising position
(my opinion--meaning I think Rebel had good chances to draw, or perhaps even
win, especially if Rohde makes some small errors).  Rebel played some poor,
specific moves--and lost. The king-safety algorithm was reportedly either
somewhat disabled or not working well in that version of Rebel.  Ed Schroeder
apparently fixed that up or improved it at least (but only time will tell for
sure); indeed, later games have not shown the same problem to the same degree.

Part of the goal of the Rebel team is to weed out such weaknesses in their
program, and improve it.  Let's assume they did just that.  The issue now
is--should we average the results (example TPR) of later games (in which Rebel
used an improved king-safety algorithm) with the prior Rohde game (with a worse
algorithm)?  There is no right answer--it depends on what we are trying to show.

In the Hoffman match, the Rebel software/hardware system apparently crashed 10
times or so.  A couple of Rebel moves were not verifiable by the control
software copy sent to the arbiter.  The Rebel software programming version
reloaded after some of the crashes may not have been the software the game was
started with (I think I read this).

This game was lost by Rebel under trying and unusual circumstances that perhaps
did not test the true chess playing ability of Rebel (at least the latest
version), which was one real goal of the match.  I say perhaps, because Rebel
Century might have played nearly all the same moves as the actual game even if
the system hadn't crashed, and maybe the result would still have been a Rebel
loss.  For example, maybe Rebel was simply outplayed by Hoffman in a beautiful
example of how a GM handles the Benko gambit against a program that doesn't have
the same GM-level understanding of how to handle that opening.  However, I don't
really know for sure, since there were some unusual things that happened during
the play of that game (the crashes, the unverifiable moves).  Also, I don't know
how the software program could properly manage its time consumption, as it
normally would, especially when it lost time (maybe even track of actual
remaining times) after each crash.  I am unsure which moves were played rather
poorly due to hardware/software failure of calculating ability, or due to being
played too fast, seemingly 'immediately' to me in some cases during the match,
after reloading after a crash.  Rebel moves might have been differently played,
and perhaps better, if the software knew that it had longer remaining time to
think, and in fact took a bit longer to analyze the best next move for Rebel.

The game and result happened.  I won't ignore that.  However, I am willing to
toss out this game as not indicative of the goal--measuring how well Rebel
Century plays at standard time controls against humans.  How much would tossing
out the single Hoffman game affect the calculated TPR of Rebel?

With the Hoffman game included, as well as inluding the two Anand games prior to
the Rebel GM Challenge series, the Rebel overall Tournament Performance Rating
(TPR) is 2480, as indicated by another poster.

If the Hoffman game is excluded (on the reasoning that hardware/software crashes
tainted the play and *possibly* the game result, rendering the use of that
result as not indicative of true strength of Rebel software against humans at
tournament time controls), the Rebel TPR is 2522.

As implied by other posters, the exact Rebel chess strength against humans at
standard time controls (TPR or whatever calculation), on relatively fast PC
hardware, is a detail not all that important (it is *near* to 2500, shown by
these particular games).

What seems more relevant (and a broader conclusion), in the attempt to evaluate
the true strength of Rebel against humans at standard time controls, is that
testing so far shows that Rebel is at or near IM strength at a minimum, and
could possibly be or perform at GM strength with some reasonable further testing
and/or improvement.

I am not so sure that additional hardware improvements (speedups of CPUs) will
quickly push Rebel or other chess programs into undenied GM strength.  If the
positional or strategic understanding (so called intelligence) that guides the
programs isn't good enough, the move selections may still be somewhat artificial
and *planless* relative to a GM's way of thinking, and the strength of the
program may not improve dramatically due solely to further doubling (for
example) of hardware speed.

On the other hand, as other posters have implied or stated, if Rebel against
humans guided the play into book opening lines that favor the program's
strengths, and successfully used special algorithms (for example, anti-GM) to
seek and maintain such positions of advantage and therefore outplay the GM, it
might reach a performance level where the GM strength assessment is not so hotly
debatable.  I think it will be years before we generally agree that programs
*plan* as well as GMs, although they may search well enough to outplay the GMs
at times and hold an even footing against them.

--Steve Boak



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.