Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Does the New SSDF List Reflect the Real Strength of Programs?

Author: Dann Corbit

Date: 17:59:40 10/24/01

Go up one level in this thread


On October 24, 2001 at 19:45:45, Christophe Theron wrote:
>On October 24, 2001 at 02:20:26, Kevin Stafford wrote:
>>>
>>>I also would not go so far as to say that comparisons are meaningless -- just
>>>that the numerical value connections are unknown.
>>
>>Yes, but is there any way to determine what the connection is without playing a
>>fairly large set of new games between members of each pool? If so, it would seem
>>to me that at present comparisons are indeed 'meaningless'. If there is a
>>statistical way to determine this without new games being played, then I'm of
>>course wrong.
>>
>>>
>>>An entity that is at the top of either list will be quite strong, and one at the
>>>bottom not so strong -- that much is obvious.
>>>
>>
>>Well of course! I wouldn't say this constitutes a comparison as the original
>>poster meant it though.
>>
>>-Kevin
>
>I'm not sure if it works mathematically, but I think it would be enough to
>adjust the list at both ends (that is, adjust the rating of the best and the
>worse entity listed by the SSDF to human ratings) to get accurate ratings for
>the whole list.

It works pragmatically, but mathematically, you would have very large error
bars.  In order to reduce the error levels to an acceptable level (say -- 50
ELO) you would have to play thousands of games across the pools.  Playing just
the top and bottom entrants would also not be the most effective way to get an
accurate answer.  Better to spread them out.  I think practically speaking that
playing the extremes is a bad idea too.  Imagine trying to find out how good you
are by playing 50 games against Garry Kasparov, and then against a random
legal-move generator.

0-50 followed by 50-0

What did you learn?  Your ELO is somewhere between 3000 and 0, +/- 500 ELO.

You get a lot more information content by playing against a large cross section
of opponents.  I suspect (but I have not tested this assumption mathematically)
that you get a maximum extraction of information by playing against opponents
100 ELO better than you and then opponents 100 ELO below you.  If they are at
exactly your level, there is too much randomness in the results.  If they are
vastly superior or inferior, it takes too many games to see the difference, and
if you should accidentally win or lose one more than you are supposed to, it
will clobber the estimate.
;-)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.