Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: New rating list based upon Human games /SSDF brought back into line

Author: Tina Long

Date: 20:33:07 12/10/99

Go up one level in this thread


On December 10, 1999 at 11:25:58, Charles Unruh wrote:

>I wonder is it possible that Once we get 20 games of GM's against Rebel at 40/2
>(there also seems to be a good number of master and GM games vs Junior)that we
>can then recalculate/readjust the ratings of the ssdf by first calculating the
>ratings of other comps vs Rebel(with the calculated rating from it's 20 games.).
> In fact We should certainly be able to find a few masters who are willing to
>play Rebel or any comp at 40/2 over a few weeks period to go ahead and get a new
>base rating for some progs to bring the SSDF back into line with Fide ratings as
>an attempt to put this GM issue or should i say (2500 fide rating issue) to bed,
>and show i was totally right once and for all :)!

Hi Charles,
A fine idea except:

Rebel has changed (developed) during the course of the games so far, so the
Rebel today is not the same Rebel that drew with Anaand.  Hardware used has also
changed.

SSDF does not (officially?) test Rebel, at Ed's request, due to Rebel's problems
playing with Autoplayer.  So while the comparison could be done by SSDF they are
(morally) not allowed to publish the results.

I'd like to see some of these top programs entered in real round-robin
tournaments, preferrably with a decent "computer board" and the computer hidden
away, to lessen the distraction for the opponents.  I think this is the best way
to get a genuine ELO rating.

There is still a problem in getting a rating though.  The "program", as time
goes by, will want to upgrade both software & hardware.

Wouldn't it be great to see the computer and cables being carried up to the
stage of the hall where the "top" games are being played.  The post-game
analysis would be interesting to see as the human implies he/she could have won
if....

As the programs get better and the hardware gets faster, the chance of computers
playing in "real" tournaments seems more remote.

If we simply used the SSDF results to say:

"these few programs are currently best of those being tested on this hardware"
"the next few programs may be as good as the best but are probably not quite
that good"
"this program version X is probably a bit better than version X-1"

then IMO the goals of SSDF would be more correctly interpreted.

I think that using the SSDF table to say:

"this program is rated yyyy"
"this program is y points better than that program"

is incorrect.

As soon as an SSDF tested program plays enough Tournament games to get a rating
the your recalibration idea is possible, in the realms of the +/- confidence
level.

And what if I take program A to a series of tournaments & acheive a rating of
2200, and You take the Same program (My copy of program A & my computer) to a
series of tournaments & acheive a rating of 2500.  After 30 or so games each, it
is quite feasable such variances can occur.

SSDF may have already recalibrated the whole list on 2200.  What to do.

It would be nice though to have some confidence about playing strength of
computers.

Hi guys,
Tina Long




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.