Author: Uri Blass
Date: 01:35:32 10/25/01
Go up one level in this thread
On October 24, 2001 at 20:59:40, Dann Corbit wrote: >On October 24, 2001 at 19:45:45, Christophe Theron wrote: >>On October 24, 2001 at 02:20:26, Kevin Stafford wrote: >>>> >>>>I also would not go so far as to say that comparisons are meaningless -- just >>>>that the numerical value connections are unknown. >>> >>>Yes, but is there any way to determine what the connection is without playing a >>>fairly large set of new games between members of each pool? If so, it would seem >>>to me that at present comparisons are indeed 'meaningless'. If there is a >>>statistical way to determine this without new games being played, then I'm of >>>course wrong. >>> >>>> >>>>An entity that is at the top of either list will be quite strong, and one at the >>>>bottom not so strong -- that much is obvious. >>>> >>> >>>Well of course! I wouldn't say this constitutes a comparison as the original >>>poster meant it though. >>> >>>-Kevin >> >>I'm not sure if it works mathematically, but I think it would be enough to >>adjust the list at both ends (that is, adjust the rating of the best and the >>worse entity listed by the SSDF to human ratings) to get accurate ratings for >>the whole list. > >It works pragmatically, but mathematically, you would have very large error >bars. In order to reduce the error levels to an acceptable level (say -- 50 >ELO) you would have to play thousands of games across the pools. Playing just >the top and bottom entrants would also not be the most effective way to get an >accurate answer. Better to spread them out. I think practically speaking that >playing the extremes is a bad idea too. Imagine trying to find out how good you >are by playing 50 games against Garry Kasparov, and then against a random >legal-move generator. > >0-50 followed by 50-0 > >What did you learn? Your ELO is somewhere between 3000 and 0, +/- 500 ELO. Christophe did not suggest to play against the same humans. The idea is the following idea: 2600 players play against Deep Fritz and Gambittiger on A1200 so you can get rating for them. 1700 players play against the weakest programs in the ssdf list so you can get rating for the weakest programs against humans. If you want to get an estimate for the rating of a new program against humans you can use an estimate assuming some linear formula between the ssdf rating and the human rating I give an example in numbers and you can generalize it to a formula. 2700 ssdf =2600 for humans 1500 ssdf =1800 for humans 2400 ssdf=1800+(2400-1500)/(2700-1500)*(2600-1800)=2400 for humans The problem with this way is the fact that programs that are better against humans are not always better against computers but it is still better estimate than the ssdf rating. Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.