Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: New rating list based upon Human games /SSDF brought back into line

Author: Charles Unruh
Date: 05:30:31 12/11/99
On December 11, 1999 at 04:21:56, Stephen A. Boak wrote:

>>On December 10, 1999 at 11:25:58, Charles Unruh wrote:
>
>>I wonder is it possible that Once we get 20 games of GM's against Rebel at 40/2
>>(there also seems to be a good number of master and GM games vs Junior)that we
>>can then recalculate/readjust the ratings of the ssdf by first calculating the
>>ratings of other comps vs Rebel(with the calculated rating from it's 20 games.).
>
>I suggested the same possibility, in a post a few weeks ago.
>
>There are multiple conceptual problems with this approach, as I pointed out
>then.  Recalibrating the SSDF ratings based on a single, recently FIDE rated
>program, such as Rebel, would have to overcome the following objections:

Well i did say that Junior should also be considered as well, since there are
some games on record.

>
>1. The individual rating of each SSDF rated computer program is not known with
>respect to play against FIDE rated humans.

That's a given, except for the program/s which would be the base being Rebel and
possibly Junior their might be another that has a significant # of games.
>
>2. There is no simple math formula to know how well a computer program will play
>against humans, based on its SSDF rating.

Of course not that's why we need to get some programs re-rated in the fashion
that regular humans get their games, by playing games in a series against
humans, that's what we are talking about doing.

>
>One could devise a formula, based in part on the SSDF ratings, but one could not
>test such a formula for Standard Error of Estimate (SEE), or Standard Error of
>Forecast, without playing many, many 40/2 games between many of those programs
>and strong FIDE-rated players.  And that formula might not work for programs not
>so tested during the development and checkout of the formula.

The current SSDF rating is bogus in comparison to fide, by about 125-150 points.
 The aim here was not to use the current SSDF list to get a rating, but rather
to get a normal rating for a few programs starting with rebel, and then
calculate the other programs fide rating against their results against these
tested programs.  Ratings drift would begin to occur, but it would take some
time, and we would have a more accurate rating in relation to fide elo than we
do now.

>
>3. It is not obvious that the relative rating of computer programs in comp-comp
>play will hold for the same programs when they play humans at 40/2 time
>controls.

No one said that it was, again that's why we are trying to calculate this rating
to find out.

>
>A computer program that can out think (let's say play tactically better) other
>computer programs will have a higher comp-comp rating.  That same program *may*
>have strategic and positional weaknesses that are worse than the strategic and
>positional flaws of the programs it beats using its tactical advantages--which
>weaknesses are not taken advantage of by those weaker programs.

I may be a great tacticican and play great against tactically weak players, yet
their are some super solid positional players i mught not do so well against.
In the words of lasker "tactics are only possible when some mustake has been
made"  frequently humans make those mistakes.  Sure it's possible that some
progs are weaker against humans than comps, than against progs.  However we run
into that in regular play, their are some people who are better against
comps(certain players) than others it evens out.

>
>By contrast, reasonably strong human players might exploit those strategic and
>positional weaknesses more readily against the strongest comp-comp program than
>against its lesser rated (comp-comp) competitors.  A program is a combination of
>strategic, positional and tactical abilities.  Since most programs are weak in
>the strategic planning aspect, they establish their comp-comp ratings based more
>on their shorter term positional and tactical abilities.  Their strategic
>weaknesses are not strongly reflected in their relative comp-comp ratings, since
>those ratings emphasize their tactical skills.  When those programs play
>relatively strong FIDE-rated humans, they will likely receive FIDE-type ratings
>that more closely reflect their relative strategic skills (and weaknesses)
>because the humans will seek to beat the computers by attacking their known
>weaknesses.
>
>4. Therefore, even if the FIDE rating of one program (example, REBEL) is known,
>the relative rating spread among the many SSDF-rated computer programs is not
>known with respect to their performance against FIDE rated humans.  The relative
>SSDF ratings of the various programs might actually be significantly different,
>after those programs played lots of 40/2 games with strong human players.

True to some small extent ,however  you are basing this assumption that programs
are SIGNIFICANTlY different in tactical abilities and strategic abilities, which
i think is a falsity.  Further their are plenty of GM's that are 2600
positional, 2450 tactical, and maybe 2450-2500 endgame(or less), these
combination of factors still put them over 2500.  The comps gifts offset it's
weaknesses just as humans offset their weaknesses with their special talents.
>
>I enjoy speculating about computer program ratings and thinking of tests to good
>measure their strengths and weaknesses.  The measure we chess fans want most,
>however, is how well the programs will do against strong human players.  For
>this there is no test as good as playing under serious tournament conditions,
>with money at stake, against strong human competition (FIDE rated).

100% correct, however let's not ask for miracles.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.