Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: New rating list based upon Human games /SSDF brought back into line

Author: Charles Unruh
Date: 05:10:24 12/11/99
On December 10, 1999 at 23:33:07, Tina Long wrote:

>On December 10, 1999 at 11:25:58, Charles Unruh wrote:
>
>>I wonder is it possible that Once we get 20 games of GM's against Rebel at 40/2
>>(there also seems to be a good number of master and GM games vs Junior)that we
>>can then recalculate/readjust the ratings of the ssdf by first calculating the
>>ratings of other comps vs Rebel(with the calculated rating from it's 20 games.).
>> In fact We should certainly be able to find a few masters who are willing to
>>play Rebel or any comp at 40/2 over a few weeks period to go ahead and get a new
>>base rating for some progs to bring the SSDF back into line with Fide ratings as
>>an attempt to put this GM issue or should i say (2500 fide rating issue) to bed,
>>and show i was totally right once and for all :)!
>
>Hi Charles,
>A fine idea except:
>
>Rebel has changed (developed) during the course of the games so far, so the
>Rebel today is not the same Rebel that drew with Anaand.  Hardware used has also
>changed.

This is true, however the program has gotten no weaker.  When a person goes to a
tournament, and then attends another one a year later they are not the same
person either, yet the games they played a year ago still comprise their record,
so unless rebel has been totally rewritten, i think it is still fair to use the
games, because all that the older games can do is depress the rating somewhat,
and since i'm sure this will still show GM strength it's still more than
acceptable and i'll take a slight rating deflation to counter claims of
inflation any day.
>
>SSDF does not (officially?) test Rebel, at Ed's request, due to Rebel's problems
>playing with Autoplayer.  So while the comparison could be done by SSDF they are
>(morally) not allowed to publish the results.

Who's morals? Not mine, where did you buy this book of morals i want to talk to
the author.

>
>I'd like to see some of these top programs entered in real round-robin
>tournaments, preferrably with a decent "computer board" and the computer hidden
>away, to lessen the distraction for the opponents.  I think this is the best way
>to get a genuine ELO rating.

I'd love it too, but i want to be realistic, nobody wants to play the comps in
an actual regular 40/2 tournament, not even GM's why?  Answer, not because they
think it's weak non grandmaster strength, but because indeed they are sure that
it's strong and GM strength, and a harder opponent than most human opponents.
Thus risking their tournament score.

>
>There is still a problem in getting a rating though.  The "program", as time
>goes by, will want to upgrade both software & hardware.
>
>Wouldn't it be great to see the computer and cables being carried up to the
>stage of the hall where the "top" games are being played.  The post-game
>analysis would be interesting to see as the human implies he/she could have won
>if....
>
>As the programs get better and the hardware gets faster, the chance of computers
>playing in "real" tournaments seems more remote.
>
>If we simply used the SSDF results to say:
>
>"these few programs are currently best of those being tested on this hardware"
>"the next few programs may be as good as the best but are probably not quite
>that good"
>"this program version X is probably a bit better than version X-1"
>
>then IMO the goals of SSDF would be more correctly interpreted.
>
>I think that using the SSDF table to say:
>
>"this program is rated yyyy"
>"this program is y points better than that program"
>
>is incorrect.
>

No one is interested in this program is this many points better than this
program, we can already do something close to that.  What people want to see is
how strong they are compared to human ELO's. So though you mught prefer to see
some rating list of that nature you are in the minority.

>As soon as an SSDF tested program plays enough Tournament games to get a rating
>the your recalibration idea is possible, in the realms of the +/- confidence
>level.

I agree, and it should be done, for the reason of trying to get the best
approximation of a human fide rating, and end this GM strength debate. Even
though as was noted in an ICC poll a couple of software and hardware generations
ago most people already are certain already that comps are GM strength.
>
Re: New rating list based upon Human games /SSDF brought back into line Tina Long 21:41:28 12/11/99
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.