Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Does the New SSDF List Reflect the Real Strength of Programs?

Author: Dann Corbit

Date: 18:20:36 10/24/01

Go up one level in this thread


On October 24, 2001 at 21:07:18, Derrick Daniels wrote:

>On October 24, 2001 at 20:59:40, Dann Corbit wrote:
>
>>On October 24, 2001 at 19:45:45, Christophe Theron wrote:
>>>On October 24, 2001 at 02:20:26, Kevin Stafford wrote:
>>>>>
>>>>>I also would not go so far as to say that comparisons are meaningless -- just
>>>>>that the numerical value connections are unknown.
>>>>
>>>>Yes, but is there any way to determine what the connection is without playing a
>>>>fairly large set of new games between members of each pool? If so, it would seem
>>>>to me that at present comparisons are indeed 'meaningless'. If there is a
>>>>statistical way to determine this without new games being played, then I'm of
>>>>course wrong.
>>>>
>>>>>
>>>>>An entity that is at the top of either list will be quite strong, and one at the
>>>>>bottom not so strong -- that much is obvious.
>>>>>
>>>>
>>>>Well of course! I wouldn't say this constitutes a comparison as the original
>>>>poster meant it though.
>>>>
>>>>-Kevin
>>>
>>>I'm not sure if it works mathematically, but I think it would be enough to
>>>adjust the list at both ends (that is, adjust the rating of the best and the
>>>worse entity listed by the SSDF to human ratings) to get accurate ratings for
>>>the whole list.
>>
>>It works pragmatically, but mathematically, you would have very large error
>>bars.  In order to reduce the error levels to an acceptable level (say -- 50
>>ELO) you would have to play thousands of games across the pools.  Playing just
>>the top and bottom entrants would also not be the most effective way to get an
>>accurate answer.  Better to spread them out.  I think practically speaking that
>>playing the extremes is a bad idea too.  Imagine trying to find out how good you
>>are by playing 50 games against Garry Kasparov, and then against a random
>>legal-move generator.
>>
>>0-50 followed by 50-0
>>
>>What did you learn?  Your ELO is somewhere between 3000 and 0, +/- 500 ELO.
>>
>>You get a lot more information content by playing against a large cross section
>>of opponents.  I suspect (but I have not tested this assumption mathematically)
>>that you get a maximum extraction of information by playing against opponents
>>100 ELO better than you and then opponents 100 ELO below you.  If they are at
>>exactly your level, there is too much randomness in the results.  If they are
>>vastly superior or inferior, it takes too many games to see the difference, and
>>if you should accidentally win or lose one more than you are supposed to, it
>>will clobber the estimate.
>
>
>
> I wonder Mr. Corbit if you believe the Present Elo System used by Fide is
>Mathematically Correct?

I think that they should post the error bars for players like the SSDF does.

> From the posts I have read from You, there are never
>enough games or data, whatever human or computer to get a reliable estimate,

Depends on what you mean by reliable.  I think that if you have +/- 25 ELO
within one standard deviation, that's a pretty darn good estimate.  The SSDF
gets numbers in this range, and towards the end of a GM's career, the FIDE data
would produce similar values.

> why
>don't you just make the blanket statement that No ratings Period can be
>Accurately Measured.
>>;-)

I don't like blanket statements.  Wait a minute -- that's a blanket statement!
So I don't like it.
;-)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.