Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Rebel Performance Rating

Author: Howard Exner

Date: 21:06:27 09/20/99

Go up one level in this thread


On September 19, 1999 at 18:30:02, Stephen A. Boak wrote:

>On September 19, 1999 at 12:09:21, Howard Exner wrote:
>>
>>Can we really compare what Rebel is doing to how GM's attain their status?
>>Given the format of these series of games makes it totally unique in the sense
>>that no human GM was "measured" this way - IE: having to face opponents
>>with months of prep time? It may be safer not to extrapolate too much from this
>>event. I think it should just stand on its own where Rebel achieved such and
>>such a rating based on the parameters outlined by the event.
>>
>
>I agree in part; but please note that in real life nothing is ever exactly the
>same from test to test, yet we still draw general conclusions, more or less.
>
>For example, I agree that generalizing or concluding too much from one specific
>(rather short) set of tests or circumstances is overdoing it.  The issue is--
>what is too much?  This isn't clear.
>
>Never-the-less, results from one set of tests under a certain particular set of
>conditions or circumstances are *some* measure of performance, even if not
>conclusive.  The issue is how much confidence we have in the generalizations or
>conclusions we draw.
>
>Rebel Century lost to Rohde in their first match, despite a promising position
>(my opinion--meaning I think Rebel had good chances to draw, or perhaps even
>win, especially if Rohde makes some small errors).  Rebel played some poor,
>specific moves--and lost. The king-safety algorithm was reportedly either
>somewhat disabled or not working well in that version of Rebel.  Ed Schroeder
>apparently fixed that up or improved it at least (but only time will tell for
>sure); indeed, later games have not shown the same problem to the same degree.
>
>Part of the goal of the Rebel team is to weed out such weaknesses in their
>program, and improve it.  Let's assume they did just that.  The issue now
>is--should we average the results (example TPR) of later games (in which Rebel
>used an improved king-safety algorithm) with the prior Rohde game (with a worse
>algorithm)?  There is no right answer--it depends on what we are trying to show.
>
>In the Hoffman match, the Rebel software/hardware system apparently crashed 10
>times or so.  A couple of Rebel moves were not verifiable by the control
>software copy sent to the arbiter.  The Rebel software programming version
>reloaded after some of the crashes may not have been the software the game was
>started with (I think I read this).
>
>This game was lost by Rebel under trying and unusual circumstances that perhaps
>did not test the true chess playing ability of Rebel (at least the latest
>version), which was one real goal of the match.  I say perhaps, because Rebel
>Century might have played nearly all the same moves as the actual game even if
>the system hadn't crashed, and maybe the result would still have been a Rebel
>loss.  For example, maybe Rebel was simply outplayed by Hoffman in a beautiful
>example of how a GM handles the Benko gambit against a program that doesn't have
>the same GM-level understanding of how to handle that opening.  However, I don't
>really know for sure, since there were some unusual things that happened during
>the play of that game (the crashes, the unverifiable moves).  Also, I don't know
>how the software program could properly manage its time consumption, as it
>normally would, especially when it lost time (maybe even track of actual
>remaining times) after each crash.  I am unsure which moves were played rather
>poorly due to hardware/software failure of calculating ability, or due to being
>played too fast, seemingly 'immediately' to me in some cases during the match,
>after reloading after a crash.  Rebel moves might have been differently played,
>and perhaps better, if the software knew that it had longer remaining time to
>think, and in fact took a bit longer to analyze the best next move for Rebel.
>
>The game and result happened.  I won't ignore that.  However, I am willing to
>toss out this game as not indicative of the goal--measuring how well Rebel
>Century plays at standard time controls against humans.  How much would tossing
>out the single Hoffman game affect the calculated TPR of Rebel?
>
>With the Hoffman game included, as well as inluding the two Anand games prior to
>the Rebel GM Challenge series, the Rebel overall Tournament Performance Rating
>(TPR) is 2480, as indicated by another poster.
>
>If the Hoffman game is excluded (on the reasoning that hardware/software crashes
>tainted the play and *possibly* the game result, rendering the use of that
>result as not indicative of true strength of Rebel software against humans at
>tournament time controls), the Rebel TPR is 2522.
>
>As implied by other posters, the exact Rebel chess strength against humans at
>standard time controls (TPR or whatever calculation), on relatively fast PC
>hardware, is a detail not all that important (it is *near* to 2500, shown by
>these particular games).
>
>What seems more relevant (and a broader conclusion), in the attempt to evaluate
>the true strength of Rebel against humans at standard time controls, is that
>testing so far shows that Rebel is at or near IM strength at a minimum, and
>could possibly be or perform at GM strength with some reasonable further testing
>and/or improvement.
>
>I am not so sure that additional hardware improvements (speedups of CPUs) will
>quickly push Rebel or other chess programs into undenied GM strength.  If the
>positional or strategic understanding (so called intelligence) that guides the
>programs isn't good enough, the move selections may still be somewhat artificial
>and *planless* relative to a GM's way of thinking, and the strength of the
>program may not improve dramatically due solely to further doubling (for
>example) of hardware speed.
>
>On the other hand, as other posters have implied or stated, if Rebel against
>humans guided the play into book opening lines that favor the program's
>strengths, and successfully used special algorithms (for example, anti-GM) to
>seek and maintain such positions of advantage and therefore outplay the GM, it
>might reach a performance level where the GM strength assessment is not so hotly
>debatable.  I think it will be years before we generally agree that programs
>*plan* as well as GMs, although they may search well enough to outplay the GMs
>at times and hold an even footing against them.
>
>--Steve Boak

Much of the above coincides with my thinking also. One good indicator of
the GM Challenge is how computers would do in match play. Like humans,
some are better in tournament play while others do better in matches.
Where would computers fall? I'd wager that most would think that
computers would do better in tournaments (that is if they were given more of a
chance to enter these tournaments).



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.