Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Does the New SSDF List Reflect the Real Strength of Programs?

Author: Uri Blass

Date: 01:35:32 10/25/01

Go up one level in this thread


On October 24, 2001 at 20:59:40, Dann Corbit wrote:

>On October 24, 2001 at 19:45:45, Christophe Theron wrote:
>>On October 24, 2001 at 02:20:26, Kevin Stafford wrote:
>>>>
>>>>I also would not go so far as to say that comparisons are meaningless -- just
>>>>that the numerical value connections are unknown.
>>>
>>>Yes, but is there any way to determine what the connection is without playing a
>>>fairly large set of new games between members of each pool? If so, it would seem
>>>to me that at present comparisons are indeed 'meaningless'. If there is a
>>>statistical way to determine this without new games being played, then I'm of
>>>course wrong.
>>>
>>>>
>>>>An entity that is at the top of either list will be quite strong, and one at the
>>>>bottom not so strong -- that much is obvious.
>>>>
>>>
>>>Well of course! I wouldn't say this constitutes a comparison as the original
>>>poster meant it though.
>>>
>>>-Kevin
>>
>>I'm not sure if it works mathematically, but I think it would be enough to
>>adjust the list at both ends (that is, adjust the rating of the best and the
>>worse entity listed by the SSDF to human ratings) to get accurate ratings for
>>the whole list.
>
>It works pragmatically, but mathematically, you would have very large error
>bars.  In order to reduce the error levels to an acceptable level (say -- 50
>ELO) you would have to play thousands of games across the pools.  Playing just
>the top and bottom entrants would also not be the most effective way to get an
>accurate answer.  Better to spread them out.  I think practically speaking that
>playing the extremes is a bad idea too.  Imagine trying to find out how good you
>are by playing 50 games against Garry Kasparov, and then against a random
>legal-move generator.
>
>0-50 followed by 50-0
>
>What did you learn?  Your ELO is somewhere between 3000 and 0, +/- 500 ELO.

Christophe did not suggest to play against the same humans.

The idea is the following idea:
2600 players play against Deep Fritz and Gambittiger on A1200
so you can get rating for them.

1700 players play against the weakest programs in the ssdf list so you can get
rating for the weakest programs against humans.

If you want to get an estimate for the rating of a new program against humans
you can use an estimate assuming some linear formula between the ssdf rating and
the human rating

I give an example in numbers and you can generalize it to a formula.

2700 ssdf =2600 for humans
1500 ssdf =1800 for humans

2400 ssdf=1800+(2400-1500)/(2700-1500)*(2600-1800)=2400 for humans

The problem with this way is the fact that programs that are better against
humans are not always better against computers but it is still better estimate
than the ssdf rating.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.