Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Will the Rebel Vs IM Russek match be counted in calculating Rebel's elo?

Author: Stephen A. Boak

Date: 12:19:22 12/24/99

Go up one level in this thread


On December 24, 1999 at 05:48:58, John Warfield wrote:

>
>
>  Since these games are 40/2 Will this count in perdicting Rebels rating? I
>assume that It will count.

Hi John,

a bit of philosophy here (rhetorical questions, no need to answer)--

To whom do you address your question?

Who can/should determine what will 'count'?

Anybody can calculate (or predict) SSDF, FIDE, USCF, ELO-type, or TPR ratings,
using whatever data they have available, for any chess program.  Just get the
test data (including opponent ratings, conditions and results), grab a pencil
and some paper, or computer or calculator, and do the mathematics.

The real issue, however, is not the math calculation itself.  It is the issue
you have raised--what should be counted (included) in the calculation.

To be believable, the calculator should show the basis for the calculation, and
it is up to the reader to accept or reject, believe or disbelieve, doubt or not
doubt the published calculations.

To be believable, the basis for the calculation should be believable
(trustworthy).  That is the underlying games, conditions and results should be
as free from error and controversy as possible.  Otherwise the published results
and hence the calculations of rating may be in doubt.

Just like science--there are many experimenters, there are many published
results of experiments, and there are often many differing opinions on the
interpretation of results.

When one interpreter points out a testing flaw, or uncontrolled variable that
might have affected the results of another person's published results of
experimenting, this may raise doubts about the meaning or usefulness or accuracy
of the published results.

In the same way, there may be some doubts about some calculations of ratings,
but as long as the basis for the calculation is shown by the calculator, all are
free to interpret the calculation results for themselves.

One can calculate the Russek match results in their own assessment of a Rebel
rating, if they so choose.

My preference is to calculate ratings for Rebel products based on open testing
and verifiable results published by Ed Schroder and the Rebel Team.

Ed has the highest motivation (Rebel programs are his business) to ensure his
testing is thorough and will stand up to public examination.

He offers prize money to FIDE-rated GMs and IMs to ensure they are motivated to
give their best effort to play his program.

Ed releases the testing conditions in advance, for comment by the world.

His games are shown 'live' on ICC to the world, so all can see the moves of the
games in his testing.

A trusted arbiter (good reputation) is at the scene of the human player to
ensure fair play.

His titled opponents are free to make their remarks after the game, to the
world, about the game and the play of Rebel software.

All viewers of the 'live' game on ICC may make observe and comments.

Ed utilizes the fastest and best hardware he can afford and trust.

He makes sure his settings are those he believes are best for the particular
test he undertakes (example--he determines if Anti-GM feature is switched on or
off, depending on his latest assessment of his own program).

Although this (Anti-GM) may be a test variable that changes from game to game,
Ed choses which variables he wants to test in his experiments and which he
believes will allow his program to play the best it can play--I have no problem
with this, as long as it is disclosed and verifiable.

The trusted arbiter has a copy of the program to verify all moves were made by
the software Ed claims he used to make the moves.

Owners of the same software (or relatively recent, similar version) can verify
moves to a large degree using their copy of the Rebel software.

As long as the Rebel Team continues to disclose its verifiable testing
conditions and as long as those conditions seem trustworthy to me, I will
continue to use those results in my calculations.

Maybe I missed a posting, but I don't know what prize money, if any, is offered
to IM Russek, to ensure his motivation is there for the upcoming match with
Rebel.  In my mind, this aspect is an example of one of several things that
cause me to lean toward use of results published by Rebel Team and not others.

Certainly we should not always trust every vendor of a product that makes claims
about their software--that goes to the trustworthiness issue and the old saying
'consider the source' and 'let the buyer beware'.  However, Ed and the Rebel
Team have not made any outlandish claims to my knowledge, they have not tried to
inflate their evidence to show Rebel is super-GM in strength (say, over 2600
FIDE ELO equivalent).  They have established a fine series of tests, under very
conrolled circumstances, verifiable to a highly reasonable degree, and have let
the chips fall where they may (win, lose or draw).  Their experiments seem to
show Rebel is about 2500 FIDE ELO strength at this time (my calculations using
their published results), although more games will perhaps strengthen both the
believability of the calculation and of the ELO estimate itself if it changes
after more games have been played.

And I expect Ed to continue to improve his Rebel program, so it may even grow
stronger as he makes continuing programming additions or changes.  Time, and
good experiments, will tell.

--Steve Boak



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.