Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Does the New SSDF List Reflect the Real Strength of Programs?

Author: Derrick Daniels

Date: 18:07:18 10/24/01

Go up one level in this thread


On October 24, 2001 at 20:59:40, Dann Corbit wrote:

>On October 24, 2001 at 19:45:45, Christophe Theron wrote:
>>On October 24, 2001 at 02:20:26, Kevin Stafford wrote:
>>>>
>>>>I also would not go so far as to say that comparisons are meaningless -- just
>>>>that the numerical value connections are unknown.
>>>
>>>Yes, but is there any way to determine what the connection is without playing a
>>>fairly large set of new games between members of each pool? If so, it would seem
>>>to me that at present comparisons are indeed 'meaningless'. If there is a
>>>statistical way to determine this without new games being played, then I'm of
>>>course wrong.
>>>
>>>>
>>>>An entity that is at the top of either list will be quite strong, and one at the
>>>>bottom not so strong -- that much is obvious.
>>>>
>>>
>>>Well of course! I wouldn't say this constitutes a comparison as the original
>>>poster meant it though.
>>>
>>>-Kevin
>>
>>I'm not sure if it works mathematically, but I think it would be enough to
>>adjust the list at both ends (that is, adjust the rating of the best and the
>>worse entity listed by the SSDF to human ratings) to get accurate ratings for
>>the whole list.
>
>It works pragmatically, but mathematically, you would have very large error
>bars.  In order to reduce the error levels to an acceptable level (say -- 50
>ELO) you would have to play thousands of games across the pools.  Playing just
>the top and bottom entrants would also not be the most effective way to get an
>accurate answer.  Better to spread them out.  I think practically speaking that
>playing the extremes is a bad idea too.  Imagine trying to find out how good you
>are by playing 50 games against Garry Kasparov, and then against a random
>legal-move generator.
>
>0-50 followed by 50-0
>
>What did you learn?  Your ELO is somewhere between 3000 and 0, +/- 500 ELO.
>
>You get a lot more information content by playing against a large cross section
>of opponents.  I suspect (but I have not tested this assumption mathematically)
>that you get a maximum extraction of information by playing against opponents
>100 ELO better than you and then opponents 100 ELO below you.  If they are at
>exactly your level, there is too much randomness in the results.  If they are
>vastly superior or inferior, it takes too many games to see the difference, and
>if you should accidentally win or lose one more than you are supposed to, it
>will clobber the estimate.



 I wonder Mr. Corbit if you believe the Present Elo System used by Fide is
Mathematically Correct? From the posts I have read from You, there are never
enough games or data, whatever human or computer to get a reliable estimate, why
don't you just make the blanket statement that No ratings Period can be
Accurately Measured.
>;-)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.