Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: The New SSDF List Accurate?

Author: Robert Hyatt
Date: 21:04:56 09/19/00
On September 19, 2000 at 21:43:00, Christophe Theron wrote:

>On September 18, 2000 at 20:12:46, Dann Corbit wrote:
>
>>On September 18, 2000 at 19:00:49, Christophe Theron wrote:
>>
>>>On September 18, 2000 at 17:42:06, odell hall wrote:
>>>
>>>>Hello CCC
>>>>
>>>>
>>>>  How many think the new SSDF List is relatively Accurate?  Personally I commend
>>>>SSDF for doing a outstanding Job, Based on my observations of the grandmaster
>>>>Challenge and other 40/2 events, I think the list is very reliable.
>>>>I believe it is safe to say that any top program  running on a K62-450 is 2500
>>>>elo, or very near.  I think that now that the rating has been significantly
>>>>lowered, this list will be taken far more seriously in determining Fide rating
>>>>for Modern Programs. I am curious if some skeptics of the List in the Past,
>>>>consider the list still to high, or Just about right?  Opinions Welcome
>>>
>>>
>>>I guess the adjustement was justified for the top programs on recent hardware,
>>>but for the older programs on slow hardware the change has been really unfair.
>>>
>>>I'm talking about the dedicated chess computer around 1900-2200 elo. Now they
>>>are rated 1800-2100 elo, which is probably not fair at all.
>>>
>>>It would have been better to do the change differently. Maybe by adding the
>>>games against human players in the SSDF database, giving them a higher weight
>>>than the comp-comp games, then recompute all the ratings based on this.
>>>
>>>I don't know if it is the best way, but just decreasing the whole list by 100
>>>elo points is not exactly a scientific method.
>>>
>>>The good thing is that it will stop the main critisism against the SSDF list, I
>>>mean people saying that the computers were overrated.
>>>
>>>I'm a strong supporter of the SSDF list. This is why I believe I can tell
>>>franckly my opinion about this change. :)
>>
>>It makes no difference what number they add or drop from the list.
>>
>>An ELO value prediction is purely based on differences.
>>
>>(x + 1000) - (y + 1000) is identical to (x-y).
>
>
>
>It makes a difference.
>
>While the goal of the list has always been to compare computers with computers,
>the initial ELO calibration, several years ago, has been made by letting
>computers play human players.
>
>So it was possible with previous list to have an idea of the strength of the
>computers against human players. At least for the class of computers that had
>been tested against humans (the dedicated computers between 1500 and 2200 elo).
>
>Also, the purpose of the 100 elo decrease was to calibrate the top of the list
>with some recent data collected during computers-humans matches. So the ELOs at
>the top of the list can be compared with more confidence to human FIDE ELOs.
>
>So the purpose of all the calibrations done by the SSDF is to allow comparison
>of SSDF ELO with FIDE ELO.
>
>
>
>    Christophe
>

Things are simply way off now.  If you calibrate the top of a rating pool that
has "stretched" the ratings too high, then you squash the lower part of that
same pool.  In effect, the top 10 may be closer to reality, but the bottom half
is now farther from the truth.

There is no way to "adjust" such a rating pool, statistically, other than to
play games between the players in the two pools that are of interest.  But
the point everybody is overlooking is that doing so produces yet a third
rating pool and does _nothing_ to calibrate either of the two original rating
pools, whatsoever.  IE we know how humans do against each other.  (FIDE)  we
know how computers do against each other (SSDF).  We know how a few programs
vs humans do against each other (Let's call this the "carson list".)  What
statistical principle can someone quote that makes it valid to take rating
pool (C) and from that adjust rating pool (SSDF) to make it more closely match
rating pool (FIDE)???

No statistics I know of supports that sort of sampling anti-theory...



>
>
>
>
>>
>>The adjustment has no impact whatsoever on the figures.  If it makes some people
>>happy, it just means that they had no idea what the figures meant in the first
>>place.
>>
>>In any case, I would be rather surprised if the new numbers fit human players
>>better unless the adjustment was based upon a large collection of real data.  If
>>it had been based on a large collection of real data, that would have been a
>>pretty exciting experiment, and I think we would have heard about it.
>>
>>Hence, my conclusion is that the change was made solely to hush the wild beasts
>>of the forest who feared the big and mighty numbers but will be calmed by the
>>melodious tones of n-100 music.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.