Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: New rating list based upon Human games /SSDF brought back into line

Author: Tina Long
Date: 21:41:28 12/11/99
On December 11, 1999 at 08:10:24, Charles Unruh wrote:

>On December 10, 1999 at 23:33:07, Tina Long wrote:
>
>>On December 10, 1999 at 11:25:58, Charles Unruh wrote:
>>
>>>I wonder is it possible that Once we get 20 games of GM's against Rebel at 40/2
>>>(there also seems to be a good number of master and GM games vs Junior)that we
>>>can then recalculate/readjust the ratings of the ssdf by first calculating the
>>>ratings of other comps vs Rebel(with the calculated rating from it's 20 games.).
>>> In fact We should certainly be able to find a few masters who are willing to
>>>play Rebel or any comp at 40/2 over a few weeks period to go ahead and get a new
>>>base rating for some progs to bring the SSDF back into line with Fide ratings as
>>>an attempt to put this GM issue or should i say (2500 fide rating issue) to bed,
>>>and show i was totally right once and for all :)!
>>
>>Hi Charles,
>>A fine idea except:
>>
>>Rebel has changed (developed) during the course of the games so far, so the
>>Rebel today is not the same Rebel that drew with Anaand.  Hardware used has also
>>changed.
>
>This is true, however the program has gotten no weaker.  When a person goes to a
>tournament, and then attends another one a year later they are not the same
>person either, yet the games they played a year ago still comprise their record,
>so unless rebel has been totally rewritten, i think it is still fair to use the
>games, because all that the older games can do is depress the rating somewhat,
>and since i'm sure this will still show GM strength it's still more than
>acceptable and i'll take a slight rating deflation to counter claims of
>inflation any day.

Agreed,

>>
>>SSDF does not (officially?) test Rebel, at Ed's request, due to Rebel's problems
>>playing with Autoplayer.  So while the comparison could be done by SSDF they are
>>(morally) not allowed to publish the results.
>
>Who's morals? Not mine, where did you buy this book of morals i want to talk to
>the author.

If I ask you "please don't expose me to the general public, and these are my
reasons." you must decide yourself what you should do, and particularly WHY you
would expose me after I have asked you not to.  People have said here SSDF can
still test Genius 6 & not get sued, that's why I included the (morally).
>
>>
>>I'd like to see some of these top programs entered in real round-robin
>>tournaments, preferrably with a decent "computer board" and the computer hidden
>>away, to lessen the distraction for the opponents.  I think this is the best way
>>to get a genuine ELO rating.
>
>I'd love it too, but i want to be realistic, nobody wants to play the comps in
>an actual regular 40/2 tournament, not even GM's why?  Answer, not because they
>think it's weak non grandmaster strength, but because indeed they are sure that
>it's strong and GM strength, and a harder opponent than most human opponents.
>Thus risking their tournament score.
>
>>
>>There is still a problem in getting a rating though.  The "program", as time
>>goes by, will want to upgrade both software & hardware.
>>
>>Wouldn't it be great to see the computer and cables being carried up to the
>>stage of the hall where the "top" games are being played.  The post-game
>>analysis would be interesting to see as the human implies he/she could have won
>>if....
>>
>>As the programs get better and the hardware gets faster, the chance of computers
>>playing in "real" tournaments seems more remote.
>>
>>If we simply used the SSDF results to say:
>>
>>"these few programs are currently best of those being tested on this hardware"
>>"the next few programs may be as good as the best but are probably not quite
>>that good"
>>"this program version X is probably a bit better than version X-1"
>>
>>then IMO the goals of SSDF would be more correctly interpreted.
>>
>>I think that using the SSDF table to say:
>>
>>"this program is rated yyyy"
>>"this program is y points better than that program"
>>
>>is incorrect.
>>
>
>No one is interested in this program is this many points better than this
>program,

Huh? That's what all the hullabaloo is about every time the SSDF release their
results.

"Yippe CM6K is best"   it was 1 (ONE!) point better than second (ignoring the
+/- 50)

"Tiger's no good, it's only leading by 25 points, program x once lead by 100
pts"

>we can already do something close to that.  What people want to see is
>how strong they are compared to human ELO's. So though you mught prefer to see
>some rating list of that nature you are in the minority.

"Don't let the SSDF list be late AGAIN"  who in the minority posted that thread
here?

>
>>As soon as an SSDF tested program plays enough Tournament games to get a rating
>>the your recalibration idea is possible, in the realms of the +/- confidence
>>level.
>
>I agree, and it should be done, for the reason of trying to get the best
>approximation of a human fide rating, and end this GM strength debate. Even
>though as was noted in an ICC poll a couple of software and hardware generations
>ago most people already are certain already that comps are GM strength.
>>

Absolutely certain  43 Votes  22.16 %
Very likely         50 Votes  25.77 %
Likely              19 Votes   9.79 %

Poll Question #14 ended September 1, 1998

"Most people are certain"  hmmm
Less than 25% of people were certain
Nearly 50% thought it very likely or certain

I'm not picking a feud with Charles here, I more or less agree with his
sentiments in this thread.

Hi guys,
Tina Long
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.